Re: Various Ideas from ApacheCon

robert engels Mon, 07 May 2007 15:36:10 -0700

I think the 'updating documents' issue is almost always related tounique document updates, where there exists some "primary unique key"for the document. Is this true?

If so, maybe a de-facto standard like a indexed/stored/non-tokenizedfield of OID should be used.


if so, it would be easy to add the following to IndexModifer:

addDocument(Document)
updateDocument(Document)
removeDocument(String OID)
removeDocument(Document)

and that would probably simplify the life of beginning Lucene users,and it mimics the CRUD syntax most people are familiar with.



On May 7, 2007, at 5:25 PM, Grant Ingersoll wrote:

Hey Gang,
Back from ApacheCon in Amsterdam, and thought I would give a bit ofa report on a few things that were interesting related to Lucene.
First off, there was a very high level of interest in Lucene andSolr, which was great to see.
In doing a training and a talk, couple of things that people seemedto ask about a fair amount.
1. Updates and how to do them. The whole delete/add thing justnever sits well with newcomers. I want to throw out the idea ofimplementing something like the Layers functionality in photoediting tools like Photoshop (whereby the underlying image is notchanged, but the layer adds/deletes/masks it). I wonder howcomplicated it would be to mark a document as being updated andthen know that we have to look in an alternate place forinformation concerning that Field/Document such as the "updates"file. I don't know the details of implementing it, but wanted tosee if it makes any sense at all. Gut reaction is it would beslower for searching, but how much slower not sure. It couldpotentially be faster for updating and could allow for per fieldupdates. Just an idea, feel free to shoot it full of holes. Theother option might be to think about whether a flexible indexingimplementation could be optimized for updates instead ofsearching. Optimization or merges could then bring the updatesback into the fold.
2. How does Lucene search compare w/ using built in DB search? Hasanyone done a study comparing Lucene performance/quality to thelikes of MySQL/Postgres/Oracle? Related question is always on howto integrate the two.
3. Some questions on the use cases of ParallelReader. So, ifanyone cares to contribute in that arena, please do so, since Ihaven't used it.
4. As much as we like to ignore file format issues (PDF, etc.) itis one of the big questions people have about using Lucene. Tikashould help in this area, but still seems to be a little way off.Our website could help by giving more concrete advice on how tohandle different file formats and maybe even some benchmarks onit. I think we can maintain Lucene's independence from theselibraries while still giving advice on how handle them. Maybe abest practices section on the wiki?
5. Distributed Searching - Code/demonstration to do search acrossseveral indexes on several machines would be useful.
At any rate, just some random thoughts garnered from ApacheCon.All in all, a good conf. w/ lots of Lucene interest.
-Grant


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Various Ideas from ApacheCon

Reply via email to