Change behavior of ParallelReader.document(int)
-----------------------------------------------
Key: LUCENE-606
URL: http://issues.apache.org/jira/browse/LUCENE-606
Project: Lucene - Java
Type: Improvement
Components: Index
Versions: 2.0.0
Reporter: Christian Kohlschuetter
Currently, the returned documents contain, for each field, the stored data from
all enclosed IndexReaders which contain the corresponding field.
That is, a call to ParallelReader.document(doc).getFields(fieldName) returns an
array of possibly several Field objects. Since null entries are disallowed,
there is no way to determine to which IndexReader the field data exactly
belongs.
On the other side, a search for a term on that field only yields results if
that term was contained in the *first* matching IndexReader which contained the
field.
Thus, when merging the ParallelReader contents to another IndexWriter, the
indexed data does not correspond to the stored information.
I am not sure whether this can be considered a bug (in some cases, this may
exactly be required). However I would like to see an option to change this
behaviour.
I suggest a parameter for ParallelReader which specifies whether stored data
from all IndexReaders or only from the one which is repsonsible for the field's
indexed data will be returned by ParallelReader.document(int).
Please find my proposed implementation attached, as well as a JUnit testcase.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]