Thanks for the update, Jesse. Let us know of any feature Culvert needs from HBase.
After cloning Culvert, I got: [INFO] Culvert - Accumulo Integration .................... FAILURE [0.431s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:06.638s [INFO] Finished at: Thu Dec 22 13:51:34 PST 2011 [INFO] Final Memory: 20M/81M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project culvert-accumulo: Could not resolve dependencies for project com.bah.culvert:culvert-accumulo:jar:0.4.0-SNAPSHOT: Could not find artifact org.apache.accumulo:accumulo-core:jar:1.4.0-incubating-SNAPSHOT in apache-snapshots (http://repository.apache.org/snapshots/) -> [Help 1] Can someone provide hint ? On Thu, Dec 22, 2011 at 11:44 AM, Jesse Yates <jesse.k.ya...@gmail.com>wrote: > Culvert was originally introduced at Hadoop Summit 2011, but recent updates > have made it very applicable to current systems. Recently, we added support > for Accumulo as well as upgraded HBase support to 0.92. Since Hadoop > Summit, there have also been significant code cleanup and added some small > features. However, we found that most people hadn't heard of Culvert, so we > wanted to re-release the framework. > > For an introduction to using Culvert, check out the blog post here: > http://jyates.github.com/2011/11/17/intro-to-culvert.html > > Also, the original presentation (where we discuss the internals) is > available on slideshare< > http://www.slideshare.net/jesse_yates/culvert-a-robust-framework-for-secondary-indexing-of-structured-and-unstructured-data > > > . > > There is a Culvert hackathon in the middle of January: > http://culverthackathon2012.eventbrite.com/ > > Oh, and you can find the code on > github<https://github.com/booz-allen-hamilton/culvert> > . > > Below is an overview of why we wrote Culvert and what it does. > > Secondary indexing is a common design pattern in BigTable-like databases > that allows users to index one or more columns in a table. This technique > enables fast search of records in a database based on a particular column > instead of the row id, thus enabling relational-style semantics in a NoSQL > environment. Frequently, the index is stored either in a reserved namespace > in the table or another index table. > > Despite the fact that this is a common design pattern in BigTable-based > applications, most implementations of this practice to date have been > tightly coupled with a particular application. As a result, few > general-purpose frameworks for secondary indexing on BigTable-like > databases exist, and those that do are tied to a particular implementation > of the BigTable model. > > There are several existing tools (Solr, Lily), but these are focused on > doing text based search and are highly restrictive to indexes created > through their framework. What if you want to use your existing indexes? Or > leverage the indexes to do complex queries? > > We developed a solution to this problem called Culvert that supports online > index updates as well as a variation of the HIVE query language. In > designing Culvert, we sought to make the solution pluggable so that it can > be used on any of the many BigTable-like databases (HBase, Cassandra, > etc.). Furthermore, it is also easily extensible to existing, hand rolled > indexes. > > As well as being a secondary indexing framework, it is also a query > execution mechanism - think pig/hive minus the fancy command line. We > support a subset of SQL, but are able to take full advantage of home-rolled > and built-in indexes, leading to query execution times potentially orders > of magnitude smaller than existing approaches and certainly orders of > magnitude more easily. > > -- Jesse > ------------------- > Jesse Yates > 240-888-2200 > @jesse_yates >