Re: lucene index on hadoop

Doug Cutting Tue, 21 Nov 2006 19:09:58 -0800

Dennis Kubes wrote:

You would build the indexes on hadoop but then move then to local filesystems for searching. You wouldn't want to perform searches using theDFS.

Creating Lucene indexes directly in DFS would be pretty slow. Nutchcreates them locally, then copies them to DFS to avoid this.

One could create a Lucene Directory implementation optimized forupdates, where new files are written locally, and only flushed to DFSwhen the Directory is closed. When updating, Lucene creates and readslots of files that might not last very long, so there's little point inreplicating them on the network. For many applications, that should beconsiderably faster than either updating indexes directly in HDFS, orcopying the entire index locally, modifying it, then copying it back.

Lucene search works from HDFS-resident indexes, but is slow, especiallyif the indexes were created on a different node than that searchingthem. (HDFS tries to write one replica of each block locally on thenode where it is created.)


Doug

Re: lucene index on hadoop

Reply via email to