Lucene Gdata -- the best way to store the feeds / entries

Simon Willnauer Sat, 27 May 2006 16:33:50 -0700

For those who haven't heard about the GData project please check
today's mailing list  .
The Lucene Indexer is supposed to be used as the search component of
this implementation. As GData is an extension to the Atom/Rss format
including search and a kind of versioning. This project is a server
side implementation of the protocol. So what's the problem, the
incoming feed entries and their updates have to be stored somewhere in
a persistent storage. The easiest approach would be a flat file
storage which is not sufficient in my eyes. I thought about using a
similar approach to the Nutch dist. file system by Indexing the
incoming entries in a "searchable" index and store the whole entry in
an associated index to prevent the index from growing to fast.
To keep the index small I would create a separate index for each feed
instance which is organized in the local file system.
I would be interested if anybody has experience with retrieving large
data like whole feed entries out of a "storage" lucene  index. Am I
supposed to face any performance problems with this approach?
As far as I know lucene doesn't support any versioning or did that
change by any chance? Well, the protocol description doesn't say
anything about retrieving old versions.(the documentation only about
optimistic locking / updating versions)


regards Simon

Lucene Gdata -- the best way to store the feeds / entries

Reply via email to