Re: Observations from Coudera Search

Patrick Hunt Thu, 25 Jul 2013 10:49:43 -0700

Hi Rahul, likely you saw my talk given you're in the bay area?

Cloudera Search is based on Solr which uses ZooKeeper to store
information about the cluster and collections (etc...). Our
integration with flume/hbase/mr/etc... all use the ZK
service/information to identify the collections/servers when
outputting documents for indexing. Everything is based on the standard
Solr APIs (for indexing).

Blur was the first search (lucene) based solution to run on HDFS
directly, we've take some of that code (Lucene "Directory"
implementation for HDFS) as the starting point for our implementation.
This has been contributed back to the Solr community (see the recent
announce for solr 4.4, e.g.
http://searchhub.org/2013/07/25/solr-4-4-went-live-this-week-a-brief-summary/)
We are working on, and hope for, further collaboration between the
Blur and Solr/Lucene communities.

Patrick

On Wed, Jul 24, 2013 at 9:11 PM, rahul challapalli
<[email protected]> wrote:
> Hi,
>
> I attended a talk from Cloudera about their search solution. One thing
> which was striking was their NRT indexing. They have multiple integration
> points (Flume, HBase) which enables them to index the data as and when it
> is written to HDFS apart from MapReduce based adhoc-batch indexing. One
> thing which was not clear was how(if any) they store metadata(analogous to
> out TableDescriptor) about Indexes.
>
> Also upon just starting a conversation later, I was told that they
> collaborated with the Blur Team which I was not aware of.
>
> - Rahul

Re: Observations from Coudera Search

Reply via email to