hm.... puting lucene on top of BDB would actually be quite cool. it would provide lucene with recovery and transaction handling....
but as far as i have seen in the lucene implementation of Directory it pushes back a inputstream, for BDB this would require us to iterate over the keys and generate this stream, equally on insert we must accept the stream and break it up into keys... is it possible to "intercept" lucene's work at the key-handling point? or would this require a larger rewrite? mvh karl �ie On Wednesday 03 April 2002 16:55, you wrote: > > -----Original Message----- > > From: Karl �ie [mailto:[EMAIL PROTECTED]] > > Sent: Wednesday, April 03, 2002 10:00 AM > > To: Lucene Users List > > Subject: Re: storing index in third party database. > > > > > > without having investigated the problem much i would think that a SQL > > database would be a very bad match for lucene as most of > > lucene's working is > > creating key's for words and documents and then creating > > indexes of these > > keys. for these purposes a SQL database is an unecessary > > overhead, not even > > talking about the overhead represented by the SQL language parser. > > > > for these kind of indexes a lower-level database would be > > better suited. I > > have good experiences with BerkeleyDB > > (http://www.sleepycat.com) and a friend > > of me uses gdbm successfully for such key-pair indexing > > tasks. the advantage > > of these low-level databasesystems is that they are really > > much or less > > persistent b-tree/hashtable implementations, and thus created > > for key-pairing. > > > > they have no SQL layer as you will have to program against > > them as they are > > more subroutines that applications. but for key-pair indexes i have > > experienced that BerkeleyDB runs circles around any SQL > > database (including > > db2 and oracle!!!). > > I would agree with this based on my experiences in implementing the > ANVIL system at Canon. SQL server was far too slow for simple term > lookup. We started with gdbm and subsequently moved to Berkeley DB. BDB > was faster in general, and more importantly, has support for > multi-threading. Analysis with Purify suggested that gdbm has some > "uninitialized memory read" problems. The folks at Sleepycat were also > very helpful in getting us going. > > -- David -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
