Re: [CLucene-dev] cloning/modifying existing documents?

Itamar Syn-Hershko Mon, 09 Aug 2010 09:32:12 -0700

Even Better - I may be wrong, but it worth a shot:

Perhaps you can find the document(s) you need to update, and use that 
same document object with a call to IndexWriter::updateDocuement()?


Even if it doesn't work right away, you could use

void *updateDocument*(Term 
<../../../../org/apache/lucene/index/Term.html> term, Document 
<../../../../org/apache/lucene/document/Document.html> doc, Analyzer 
<../../../../org/apache/lucene/analysis/Analyzer.html> analyzer)

and pass it a "dumb" analyzer, that uses a tokenizer that scans the term 
vector, and just approves all the tokens as they are (practically 
copying the term vector).

Itamar.

On 9/8/2010 7:19 PM, Itamar Syn-Hershko wrote:
> On 9/8/2010 4:01 PM, John O'Brien wrote:
>    
>> Hi,
>>       Apologies if this has already been covered in previous posts but
>> I've not been able to find the answer in the archive so far.
>>
>> We have an application which indexes mail messages. We get the
>> information for each message over IMAP, create the fields (e.g. subject,
>> body, folder etc) and write the documents to the index. When a mail
>> message is moved from one IMAP folder to another, our application gets
>> notified of the move and we want to update the folder field in the
>> existing document, so we create a new document, delete the existing one
>> and write the new one. What I'm wondering is how other people use
>> existing documents to create new ones - at the moment we get all the
>> information over IMAP again which is obviously very inefficient but to
>> make it more efficient we are now going to change it to retrieve all the
>> fields and terms for the existing document and create the new document
>> using them. Is there another (better/more efficient) way of doing this
>> than retrieving the fields and terms for the existing document?
>>
>>      
> I guess you could use stored fields (Field::STORE_YES) to have this data
> handy? it will make your index larger, but will prevent you from
> retrieving the data again over IMAP or re-constructing it using the term
> vectors (which might not work correctly for some analyzer
> implementations). You can also use compressed fields if this is a lot of
> data (watch out though, they have been deprecated in Java Lucene 2.9 or
> so, and I assume we will have to update CLucene accordingly when time
> comes).
>
> Itamar.
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> CLucene-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>
>    

------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] cloning/modifying existing documents?

Reply via email to