I did not follow this discussion from the start, but I guess you could cleanly achieve this by implementing org.apache.lucene.index.FilterIndexReader
have fun. e. ----- Original Message ---- From: 仇寅 <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, 2 March, 2008 3:05:05 AM Subject: Re: Does Lucene support partition-by-keyword indexing? Hi, I agree with your point that it is easier to partition index by document. But the partition-by-keyword approach has much greater scalability over the partition-by-document approach. Each query involves communicating with constant number of nodes; while partition-by-doc requires spreading the query a long all or many of the nodes. And I am actually doing some small research on this. By the way, the documents to be indexed are not necessarily web pages. They are mostly files stored on each node's file system. Node failures are also handled by replicas. The index for each term will be replicated on multiple nodes, whose nodeIDs are near to each other. This mechanism is handled by the underlying DHT system. So any idea how can partition index by keyword in lucene? Thanks. On Sun, Mar 2, 2008 at 5:50 AM, Mathieu Lecarme <[EMAIL PROTECTED]> wrote: > The easiest way is to split index by Document. In Lucene, index > contains Document and inverse index of Term. If you wont to put Term > in different place, Document will be duplicated on each index, with > only a part of their Term. > > How will you manage node failure in your network? > > They were some trial to build big p2p search engine to compet with > Google, but, it will be easier to split by Document. > > If you have to many computers and want to see them working together, > why don't use Nutch with Hadoop? > > M. > Le 1 mars 08 à 19:16, Yin Qiu a écrit : > > > Hi, > > > > I'm planning to implement a search infrastructure on a P2P overlay. To > > achieve this, I want to first distribute the indices to various nodes > > connected by this overlay. My approach is to partition the indices by > > keyword, that is, one node takes care of certain keywords (or > > terms). When a > > simple TermQuery is encountered, we just find the node associated > > with that > > term (with distributed hash table) and get the result. And suppose a > > BooleanQuery is issued, we contact all the nodes involved in this > > query and > > finally merge the result. > > > > So my question is: does Lucene support partitioning the indices by > > keywords? > > > > Thanks in advance. > > > > -- > > Look before you leap > > ------------------------------------------- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Look before you leap ------------------------------------------- ___________________________________________________________ Rise to the challenge for Sport Relief with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]