Hi Cesar,
On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote:
> I'm running problems with document deletion.
> [...]
> This simply doesn't delete anything from the Index.
>
> //see the code sample:
> //"theFieldName" was previously stored as Field.Store.YES and
> Field.Index.TOKENIZED.
> Term t = new Terms("theFieldName", "theFieldContent");
> objIndexReader.DeleteDocuments(t);
(You have two typos here - "new Term/s/" and /D/eleteDocuments() - I assume
that this is just a transcription error, since you must have gotten this code
to run...)
When you construct a Term instance, no analysis will be performed on
"theFieldContent". Since "theFieldName" is TOKENIZED, it was analyzed, and
this is likely where the mismatch is occurring. From
<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term)>:
This is useful if one uses a document field to
hold a unique ID string for the document.
If you're trying to delete documents based on a document ID held as the entire
value of a field, then you should be using Field.Index.UN_TOKENIZED. From
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/document/Field.Index.html#UN_TOKENIZED>:
Index the field's value without using an Analyzer,
so it can be searched. As no analyzer is used the
value will be stored as a single term. This is
useful for unique Ids like product numbers.
> 2) DeleteDocument(numDoc) <== this problem is a woot problem
>
> [...]
>
> I mean, if I call objIndexReader.DeleteDocument(0), it will
> delete the first document from the entire INDEX, not the
> first document in the Hits collection. So, it deleted the
> first documents I have inserted some days ago, in previous
> indexing sessions.
Yes, this is how this method is designed to function. The javadoc description
is perhaps too brief: "Deletes the document numbered 'docNum'". As you have
discovered, "docNum" is the one-up number assigned internally by Lucene to each
document as it is added to the index.
> I ask: is there a way to get the correct docNum from the
> document retrieved in the Hits collection?
Check out Hits.id(int):
<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Hits.html#id(int)>
The "id" returned by Hits.id(int) is the same thing as the "docNum" parameter
to IndexReader.deleteDocument(int).
It sounds like the documentation could benefit from some more discussion of the
"docNum"/document "id" feature...
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]