[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord

David Smiley (JIRA) Fri, 01 Aug 2014 15:24:11 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083064#comment-14083064
 ]


David Smiley commented on LUCENE-5156:
--------------------------------------

I agree on the caching thing -- that is, what I said in which you ask for Terms 
for the same document again.  Never-mind that part -- as I thought about it I 
realized I didn't need that after all.

bq. But i dont think it should be in the default codec. I also happen to think 
term vectors arent a good datastructure for highlighting anyway.

The default highlighter fully respects the positions and other aspects of the 
user's query, unlike the other highlighters.  Some applications demand that a 
highlight is accurate to the query, even if the query uses custom span queries 
that do tricks with payloads, etc.  It would be nice if the other highlighters 
supported accurate highlights for such queries but they don't, so today, this 
is the applicable one for accurate highlights for complex queries.  The default 
highlighter requires a Terms instance reflecting the current document -- it 
currently gets it via a re-inverting into a MemoryIndex but it can be hacked to 
accept a Terms directly from term vectors.  

So you don't like the idea of enhancing performance of term vector seekCeil in 
the default codec?  Is that a -1 or -0?  This change I propose seems harmless 
-- the code would not create & build up the new offset array if consuming code 
doesn't call seekCeil or the ord methods.

> CompressingTermVectors termsEnum should probably not support seek-by-ord
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-5156
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5156
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 4.5, 5.0
>
>         Attachments: LUCENE-5156.patch
>
>
> Just like term vectors before it, it has a O(n) seek-by-term. 
> But this one also advertises a seek-by-ord, only this is also O(n).
> This could cause e.g. checkindex to be very slow, because if termsenum 
> supports ord it does a bunch of seeking tests. (Another solution would be to 
> leave it, and add a boolean so checkindex never does seeking tests for term 
> vectors, only real fields).
> However, I think its also kinda a trap, in my opinion if seek-by-ord is 
> supported anywhere, you kinda expect it to be faster than linear time...?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord

Reply via email to