---------- Forwarded message ---------- From: John Fawcett <[EMAIL PROTECTED]> Date: Wed, 25 Oct 2006 08:39:23 -0400 Subject: Client-Server Lucene - DocumentWriter To: [EMAIL PROTECTED]
Hi, I have a design challenge in my own application's use of Lucene, which triggered an idea for distributed Lucene indexing. Below, I've summarized the design challenge, and then the indexing idea. My team is working on a client/server application. The server is a java application, and the client is in C#/.net. Right now we are adding capability for offline operation of the client. Search is part of this work, so we have been working with Lucene.net to port some of our online search capabilities to offline. The client only holds a subset of the data held on the server, so we'd like to move a subset of the search index to the client. There are two types of transfers - bulk and incremental. Our goal in both is to offload as much work as possible from the client to the server. Bulk transfers happen when a client is initializing for offline use, or resynching after returning to online. In these scenarios we plan to create a new index on the server, and just send the files to the client. The client will then have to perform an index merge. Incremental adds happen when the client application is online. New documents are transferred to the client asynchronously. Currently, we are transferring a document's extracted text. However, the client still has to perform analysis, inversion, and addition to the index. Looking through the code for the IndexWriter, I found the DocumentWriter class. DocumentWriter does the inversion and stores it in a set of integer arrays and an array of "Posting" objects. Looking through the class, it seems like the inversion info could be serialized from server to client pretty easily. The serialized data from DocumentWriter would be a portable "index record" for a single document. Our hope is that we can send this index record from the server to the client. The idea is to reduce the work on the client to be only the insertion of the inverted document to the local index. Having a portable index "record" for an individual document seems very useful - especially for distributed indexing. I can imagine running a farm of indexers that only invert documents and send them to a set of search machines that maintain indexes and field search queries. Is this something that could be added to the Lucene framework? Is the "search record" data calculated in DocumentWriter in any way dependent on the contents of the index? Will this actually save us many client cycles? Thanks, fawce --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]