For those who haven't heard about the GData project please check today's mailing list . The Lucene Indexer is supposed to be used as the search component of this implementation. As GData is an extension to the Atom/Rss format including search and a kind of versioning. This project is a server side implementation of the protocol. So what's the problem, the incoming feed entries and their updates have to be stored somewhere in a persistent storage. The easiest approach would be a flat file storage which is not sufficient in my eyes. I thought about using a similar approach to the Nutch dist. file system by Indexing the incoming entries in a "searchable" index and store the whole entry in an associated index to prevent the index from growing to fast. To keep the index small I would create a separate index for each feed instance which is organized in the local file system. I would be interested if anybody has experience with retrieving large data like whole feed entries out of a "storage" lucene index. Am I supposed to face any performance problems with this approach? As far as I know lucene doesn't support any versioning or did that change by any chance? Well, the protocol description doesn't say anything about retrieving old versions.(the documentation only about optimistic locking / updating versions)
regards Simon