On 27/07/2012 00:50, Mike O'Leary wrote:
Hi Robert,
Thanks for your help. This cleared up all of the things I was having trouble
understanding about offsets and positions in term vectors.
Mike
-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Friday, July 20, 2012 5:59 PM
To: java-user@lucene.apache.org
Subject: Re: Problem with TermVector offsets and positions not being preserved
On Fri, Jul 20, 2012 at 8:24 PM, Mike O'Leary <tmole...@uw.edu> wrote:
Hi Robert,
I'm not trying to determine whether a document has term vectors, I'm trying to
determine whether the term vectors that are in the index have offsets and
positions > stored.
Right: what i'm trying to tell you is that offsets and positions is not an
index-wide setting for a field: its per-document.
I think all the tools you are using to check these values are not doing it
correctly:
1. DumpIndex is wrongly using values from the Document returned by
IndexReader.document(), but that doesn't and never did retrieve these values (it would be
2 extra disk seeks per document to figure out the term vector flags) 2. I havent looked
at Luke, but its probably printing the "global"
bits from FieldInfos. It used to be that we wrote some bits for these options,
I don't ever know what the purpose was since these options can be controlled
on/off at a per-document level: they make no sense.
Because of this we stopped writing these bits in 3.6 (we only write into
FieldInfos if the field has any term vectors at all), and thats probably whats
confusing you there.
Catching up with this thread ... Luke 4.0-ALPHA makes a similar mistake.
I fixed this in svn (to be released in a week or so) so that:
* Luke now actually checks whether a doc has term vectors for a
particular field and adjusts the field flags based on the
presence/absence of a term vector. FieldInfos were not enough to handle
some combinations.
* Luke doesn't show the offsets/positions flags in the document view,
since they are not known in advance. However, the pop-up that shows a
term vector correctly shows positions and offsets if available (or
blanks if not available).
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org