Re: [CLucene-dev] cloning/modifying existing documents?

John O'Brien Mon, 09 Aug 2010 10:13:58 -0700

Thanks Itamar for the quick response.
I'll let you know how we get on in the next few days.


Thanks,
John.

Itamar Syn-Hershko wrote:
> Even Better - I may be wrong, but it worth a shot:
>
> Perhaps you can find the document(s) you need to update, and use that 
> same document object with a call to IndexWriter::updateDocuement()?
>
> Even if it doesn't work right away, you could use
>
> void *updateDocument*(Term 
> <../../../../org/apache/lucene/index/Term.html> term, Document 
> <../../../../org/apache/lucene/document/Document.html> doc, Analyzer 
> <../../../../org/apache/lucene/analysis/Analyzer.html> analyzer)
>
> and pass it a "dumb" analyzer, that uses a tokenizer that scans the term 
> vector, and just approves all the tokens as they are (practically 
> copying the term vector).
>
> Itamar.
>
> On 9/8/2010 7:19 PM, Itamar Syn-Hershko wrote:
>   
>> On 9/8/2010 4:01 PM, John O'Brien wrote:
>>    
>>     
>>> Hi,
>>>       Apologies if this has already been covered in previous posts but
>>> I've not been able to find the answer in the archive so far.
>>>
>>> We have an application which indexes mail messages. We get the
>>> information for each message over IMAP, create the fields (e.g. subject,
>>> body, folder etc) and write the documents to the index. When a mail
>>> message is moved from one IMAP folder to another, our application gets
>>> notified of the move and we want to update the folder field in the
>>> existing document, so we create a new document, delete the existing one
>>> and write the new one. What I'm wondering is how other people use
>>> existing documents to create new ones - at the moment we get all the
>>> information over IMAP again which is obviously very inefficient but to
>>> make it more efficient we are now going to change it to retrieve all the
>>> fields and terms for the existing document and create the new document
>>> using them. Is there another (better/more efficient) way of doing this
>>> than retrieving the fields and terms for the existing document?
>>>
>>>      
>>>       
>> I guess you could use stored fields (Field::STORE_YES) to have this data
>> handy? it will make your index larger, but will prevent you from
>> retrieving the data again over IMAP or re-constructing it using the term
>> vectors (which might not work correctly for some analyzer
>> implementations). You can also use compressed fields if this is a lot of
>> data (watch out though, they have been deprecated in Java Lucene 2.9 or
>> so, and I assume we will have to update CLucene accordingly when time
>> comes).
>>
>> Itamar.
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> CLucene-developers mailing list
>> CLucene-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>>
>>
>>    
>>     
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>   


------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] cloning/modifying existing documents?

Reply via email to