Hi Rahul, likely you saw my talk given you're in the bay area? Cloudera Search is based on Solr which uses ZooKeeper to store information about the cluster and collections (etc...). Our integration with flume/hbase/mr/etc... all use the ZK service/information to identify the collections/servers when outputting documents for indexing. Everything is based on the standard Solr APIs (for indexing).
Blur was the first search (lucene) based solution to run on HDFS directly, we've take some of that code (Lucene "Directory" implementation for HDFS) as the starting point for our implementation. This has been contributed back to the Solr community (see the recent announce for solr 4.4, e.g. http://searchhub.org/2013/07/25/solr-4-4-went-live-this-week-a-brief-summary/) We are working on, and hope for, further collaboration between the Blur and Solr/Lucene communities. Patrick On Wed, Jul 24, 2013 at 9:11 PM, rahul challapalli <[email protected]> wrote: > Hi, > > I attended a talk from Cloudera about their search solution. One thing > which was striking was their NRT indexing. They have multiple integration > points (Flume, HBase) which enables them to index the data as and when it > is written to HDFS apart from MapReduce based adhoc-batch indexing. One > thing which was not clear was how(if any) they store metadata(analogous to > out TableDescriptor) about Indexes. > > Also upon just starting a conversation later, I was told that they > collaborated with the Blur Team which I was not aware of. > > - Rahul
