Question: index package in contrib (lucene index)

Tenaali Ram Fri, 29 May 2009 13:23:49 -0700

Anyone ?
Any help to understand this package is appreciated.

Thanks,
T


On Thu, May 28, 2009 at 3:18 PM, Tenaali Ram <tenaali...@gmail.com> wrote:

> Hi,
>
> I am trying to understand the code of index package to build a distributed
> Lucene index. I have some very basic questions and would really appreciate
> if someone can help me understand this code-
>
> 1) If I already have Lucene index (divided into shards), should I upload
> these indexes into HDFS and provide its location or the code will pick these
> shards from local file system ?
>
> 2) How is the code adding a document in the lucene index, I can see there
> is a index selection policy. Assuming round robin policy is chosen, how is
> the code adding a document in the lucene index? This is related to first
> question - is the index where the new document is to be added in HDFS or in
> local file system. I read in the README that the index is first created on
> local file system, then copied back to HDFS. Can someone please point me to
> the code that is doing this.
>
> 3) After the map reduce job finishes, where are the final indexes ? In HDFS
> ?
>
> 4) Correct me if I am wrong- the code builds multiple indexes, where each
> index is an instance of Lucene Index having a disjoint subset of documents
> from the corpus. So, if I have to search a term, I have to search each index
> and then merge the result. If this is correct, then how is the IDF of a term
> which is a global statistic computed and updated in each index ? I mean each
> index can compute the IDF wrt. to the subset of documents that it has, but
> can not compute the global IDF of a term (since it knows nothing about other
> indexes, which might have the same term in other documents).
>
> Thanks,
> -T
>
>
>

Question: index package in contrib (lucene index)

Reply via email to