Re: Lucene Gdata -- the best way to store the feeds / entries

Grant Ingersoll Sun, 28 May 2006 12:00:28 -0700

If the lazy field loading gets applied (which it should soon), you wouldsee less of a performance hit for storing items in Lucene, at least whenjust getting hits. And you could compress the feeds too

Also, maybe Subversion could act as your repository? I don't know if itis a viable solution given licensing and all that, but it supportsversioning, etc. and is pretty easy to work with, but it may be overkilland may complicate your architecture too much. Perhaps the best way isto define an Interface to this component and one or two implementationsof it (maybe flat file and BDB) and then other people can write their own.


-Grant

Simon Willnauer wrote:

Yes and No :) well the problem with the versioning system is still not
solved but I did contact the google developers to get in touch with
them to solve this problem.
I will definately have a look at the BDB JE and will keep it in mind.
I had a quick look at it and it sound quiet suitable for storing
feeds.
Thank you Otis!!1

simon

On 5/28/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

Not sure if Berkeley DB is an option, but it sounds like you justneed a "storage" component for feeds, and BDB JE might be a goodfit. I just used it recently for one such system and was quite happywith performance and ease of use.

Otis

----- Original Message ----
From: Simon Willnauer <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, May 27, 2006 7:33:28 PM
Subject: Lucene Gdata -- the best way to store the feeds / entries

For those who haven't heard about the GData project please check
today's mailing list  .
The Lucene Indexer is supposed to be used as the search component of
this implementation. As GData is an extension to the Atom/Rss format
including search and a kind of versioning. This project is a server
side implementation of the protocol. So what's the problem, the
incoming feed entries and their updates have to be stored somewhere in
a persistent storage. The easiest approach would be a flat file
storage which is not sufficient in my eyes. I thought about using a
similar approach to the Nutch dist. file system by Indexing the
incoming entries in a "searchable" index and store the whole entry in
an associated index to prevent the index from growing to fast.
To keep the index small I would create a separate index for each feed
instance which is organized in the local file system.
I would be interested if anybody has experience with retrieving large
data like whole feed entries out of a "storage" lucene  index. Am I
supposed to face any performance problems with this approach?
As far as I know lucene doesn't support any versioning or did that
change by any chance? Well, the protocol description doesn't say
anything about retrieving old versions.(the documentation only about
optimistic locking / updating versions)

regards Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Grant IngersollSr. Software EngineerCenter for Natural Language ProcessingSyracuse UniversitySchool of Information Studies335 Hinds HallSyracuse, NY 13244http://www.cnlp.orgVoice: 315-443-5484Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Gdata -- the best way to store the feeds / entries

Reply via email to