There seem to be a few other players in this space too. Are you from Rackspace? (http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop- query-terabytes-data)
AOL also has a Hadoop/Solr project going on. CNET does not have much brewing there. Although Yonik and I had talked about it a bunch -- but that was long ago. --cw Clay Webster tel:1.908.541.3724 Associate VP, Platform Infrastructure http://www.cnet.com CNET, Inc. (Nasdaq:CNET) mailto:[EMAIL PROTECTED] > -----Original Message----- > From: Ning Li [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 06, 2008 1:57 PM > To: [email protected]; [EMAIL PROTECTED]; solr- > [EMAIL PROTECTED] > Subject: Lucene-based Distributed Index Leveraging Hadoop > > There have been several proposals for a Lucene-based distributed index > architecture. > 1) Doug Cutting's "Index Server Project Proposal" at > http://www.mail-archive.com/[email protected]/msg00338.html > 2) Solr's "Distributed Search" at > http://wiki.apache.org/solr/DistributedSearch > 3) Mark Butler's "Distributed Lucene" at > http://wiki.apache.org/hadoop/DistributedLucene > > We have also been working on a Lucene-based distributed index > architecture. > Our design differs from the above proposals in the way it leverages > Hadoop > as much as possible. In particular, HDFS is used to reliably store > Lucene > instances, Map/Reduce is used to analyze documents and update Lucene > instances > in parallel, and Hadoop's IPC framework is used. Our design is geared > for > applications that require a highly scalable index and where batch > updates > to each Lucene instance are acceptable (verses finer-grained document > at > a time updates). > > We have a working implementation of our design and are in the process > of evaluating its performance. An overview of our design is provided > below. > We welcome feedback and would like to know if you are interested in > working > on it. If so, we would be happy to make the code publicly available. At > the > same time, we would like to collaborate with people working on existing > proposals and see if we can consolidate our efforts. > > TERMINOLOGY > A distributed "index" is partitioned into "shards". Each shard > corresponds > to > a Lucene instance and contains a disjoint subset of the documents in > the > index. > Each shard is stored in HDFS and served by one or more "shard servers". > Here > we only talk about a single distributed index, but in practice multiple > indexes > can be supported. > > A "master" keeps track of the shard servers and the shards being served > by > them. An "application" updates and queries the global index through an > "index client". An index client communicates with the shard servers to > execute a query. > > KEY RPC METHODS > This section lists the key RPC methods in our design. To simplify the > discussion, some of their parameters have been omitted. > > On the Shard Servers > // Execute a query on this shard server's Lucene instance. > // This method is called by an index client. > SearchResults search(Query query); > > On the Master > // Tell the master to update the shards, i.e., Lucene instances. > // This method is called by an index client. > boolean updateShards(Configuration conf); > > // Ask the master where the shards are located. > // This method is called by an index client. > LocatedShards getShardLocations(); > > // Send a heartbeat to the master. This method is called by a > // shard server. In the response, the master informs the > // shard server when to switch to a newer version of the index. > ShardServerCommand sendHeartbeat(); > > QUERYING THE INDEX > To query the index, an application sends a search request to an index > client. > The index client then calls the shard server search() method for each > shard > of the index, merges the results and returns them to the application. > The > index client caches the mapping between shards and shard servers by > periodically calling the master's getShardLocations() method. > > UPDATING THE INDEX USING MAP/REDUCE > To update the index, an application sends an update request to an index > client. > The index client then calls the master's updateShards() method, which > schedules > a Map/Reduce job to update the index. The Map/Reduce job updates the > shards > in > parallel and copies the new index files of each shard (i.e., Lucene > instance) > to HDFS. > > The updateShards() method includes a "configuration", which provides > information for updating the shards. More specifically, the > configuration > includes the following information: > - Input path. This provides the location of updated documents, e.g., > HDFS > files or directories, or HBase tables. > - Input formatter. This specifies how to format the input documents. > - Analysis. This defines the analyzer to use on the input. The > analyzer > determines whether a document is being inserted, updated, or > deleted. > For > inserts or updates, the analyzer also converts each input document > into > a Lucene document. > > The Map phase of the Map/Reduce job formats and analyzes the input (in > parallel), while the Reduce phase collects and applies the updates to > each > Lucene instance (again in parallel). The updates are applied using the > local > file system where a Reduce task runs and then copied back to HDFS. For > example, > if the updates caused a new Lucene segment to be created, the new > segment > would be created on the local file system first, and then copied back > to > HDFS. > > When the Map/Reduce job completes, a "new version" of the index is > ready to > be > queried. It is important to note that the new version of the index is > not > derived from scratch. By leveraging Lucene's update algorithm, the new > version > of each Lucene instance will share as many files as possible as the > previous > version. > > ENSURING INDEX CONSISTENCY > At any point in time, an index client always has a consistent view of > the > shards in the index. The results of a search query include either all > or > none > of a recent update to the index. The details of the algorithm to > accomplish > this are omitted here, but the basic flow is pretty simple. > > After the Map/Reduce job to update the shards completes, the master > will > tell > each shard server to "prepare" the new version of the index. After all > the > shard servers have responded affirmatively to the "prepare" message, > the new > > index is ready to be queried. An index client will then lazily learn > about > the new index when it makes its next getShardLocations() call to the > master. > > In essence, a lazy two-phase commit protocol is used, with "prepare" > and > "commit" messages piggybacked on heartbeats. After a shard has switched > to > the new index, the Lucene files in the old index that are no longer > needed > can safely be deleted. > > ACHIEVING FAULT-TOLERANCE > We rely on the fault-tolerance of Map/Reduce to guarantee that an index > update > will eventually succeed. All shards are stored in HDFS and can be read > by > any > shard server in a cluster. For a given shard, if one of its shard > servers > dies, > new search requests are handled by its surviving shard servers. To > ensure > that > there is always enough coverage for a shard, the master will instruct > other > shard servers to take over the shards of a dead shard server. > > PERFORMANCE ISSUES > Currently, each shard server reads a shard directly from HDFS. > Experiments > have shown that this approach does not perform very well, with HDFS > causing > Lucene to slow down fairly dramatically (by well over 5x when data > blocks > are > accessed over the network). Consequently, we are exploring different > ways to > leverage the fault tolerance of HDFS and, at the same time, work around > its > performance problems. One simple alternative is to add a local file > system > cache on each shard server. Another alternative is to modify HDFS so > that an > application has more control over where to store the primary and > replicas of > an HDFS block. This feature may be useful for other HDFS applications > (e.g., > HBase). We would like to collaborate with other people who are > interested in > adding this feature to HDFS. > > > Regards, > Ning Li
