RE: Delete problems O.O

Steven A Rowe Mon, 11 Feb 2008 12:06:06 -0800

Hi Cesar,

On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote:
> I'm running problems with document deletion.
> [...]
> This simply doesn't delete anything from the Index.
> 
> //see the code sample: 
> //"theFieldName" was previously stored as Field.Store.YES and 
> Field.Index.TOKENIZED. 
> Term t = new Terms("theFieldName", "theFieldContent");
> objIndexReader.DeleteDocuments(t);


(You have two typos here - "new Term/s/" and /D/eleteDocuments() - I assume 
that this is just a transcription error, since you must have gotten this code 
to run...)

When you construct a Term instance, no analysis will be performed on 
"theFieldContent".  Since "theFieldName" is TOKENIZED, it was analyzed, and 
this is likely where the mismatch is occurring.  From 
<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term)>:

    This is useful if one uses a document field to
    hold a unique ID string for the document.

If you're trying to delete documents based on a document ID held as the entire 
value of a field, then you should be using Field.Index.UN_TOKENIZED.  From 
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/document/Field.Index.html#UN_TOKENIZED>:

   Index the field's value without using an Analyzer,
   so it can be searched. As no analyzer is used the
   value will be stored as a single term. This is
   useful for unique Ids like product numbers.

> 2) DeleteDocument(numDoc) <== this problem is a woot problem
> 
> [...]
> 
> I mean, if I call objIndexReader.DeleteDocument(0), it will
> delete the first document from the entire INDEX, not the
> first document in the Hits collection. So, it deleted the
> first documents I have inserted some days ago, in previous
> indexing sessions.

Yes, this is how this method is designed to function.  The javadoc description 
is perhaps too brief: "Deletes the document numbered 'docNum'".  As you have 
discovered, "docNum" is the one-up number assigned internally by Lucene to each 
document as it is added to the index.
 
> I ask: is there a way to get the correct docNum from the
> document retrieved in the Hits collection?

Check out Hits.id(int): 
<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Hits.html#id(int)>

The "id" returned by Hits.id(int) is the same thing as the "docNum" parameter 
to IndexReader.deleteDocument(int).

It sounds like the documentation could benefit from some more discussion of the 
"docNum"/document "id" feature...

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Delete problems O.O

Reply via email to