[ 
https://issues.apache.org/jira/browse/LUCENE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786610#action_12786610
 ] 

Uwe Schindler commented on LUCENE-2110:
---------------------------------------

>From our IRC chat between Mike and me:
{quote}
[10:19] ThetaPhi_: mike: any further comments for 2110?
[10:19] mikemccand: am just trying to look at it -- waiting on a sloooooow svn 
up.
[10:19] ThetaPhi_: this out of order is not trivial and makes code for 
automaton and NRQ more complicated
[10:19] mikemccand: the change is a great step forward
[10:19] mikemccand: yeah forget it 
[10:19] ThetaPhi_: yes
[10:19] ThetaPhi_: you understood the problem
[10:19] mikemccand: i had thought it was nearly free
[10:19] mikemccand: i don't fully understand the problem
[10:19] mikemccand: impossible to keep up with you!!
[10:20] ThetaPhi_: when the underlying enum is exhausted (I was exhausted 
yesterday, too), it stops processing, even if there are more nextSeelkTerms
[10:20] mikemccand: yes I saw your exhausted pun ;)
[10:20] mikemccand: wait -- if a TermsEnum becomes exhaused, you're unable to 
seek it again?  is that the issue?
[10:20] ThetaPhi_: waht you could do is, that after the underlying enum is 
exhausted, that you call nextSeekTerm and reposition
[10:21] ThetaPhi_: yes
[10:21] ThetaPhi_: you could seek it
[10:21] ThetaPhi_: and this worked at the beginning (first patch without the 
additional enum constants)
[10:21] ThetaPhi_: what it did was: when the underlying enum was exhausted it 
simply called nextSeekTerm and seeked
[10:22] mikemccand: so TermsEnum API allows seeks after next()'ing to 
exhaustion, right?  (I think it should but maybe it' doesn't somewhere?)
[10:22] ThetaPhi_: which was ok
[10:22] ThetaPhi_: mikemccand: your enums works perfect
[10:22] ThetaPhi_: they can be seeked everytime even when exhausted
[10:22] mikemccand: ok
[10:23] ThetaPhi_: the problem is logic in filteredtermsenum
[10:23] mikemccand: ok
[10:23] ThetaPhi_: it gets very complicated when you want to support two things:
[10:23] ThetaPhi_: a) seeking on request (NO_AND_SEEK, YES_AND_SEEK)
[10:24] ThetaPhi_: b) and want to consume all nextSeekTerms()
[10:24] ThetaPhi_: if somebody seeks after _AND_SEEK to e.g. the last term of 
the enum
[10:24] ThetaPhi_: in the current impl it will read that term
[10:25] ThetaPhi_: call termCompare and so on
[10:25] ThetaPhi_: if this accepts the term or not accepts the term
[10:25] ThetaPhi_: it will call next() on the underlying enum again
[10:25] ThetaPhi_: which then returns null
[10:26] ThetaPhi_: with the current patch it will then also end the 
filteredtermsenum
[10:26] mikemccand: ok -- so the termination logic inside FTE.next() got really 
hairy if on getting null from the actualEnum you had to consider going back for 
another seek term?
[10:26] ThetaPhi_: yes
[10:26] ThetaPhi_: i tried it yesterday
[10:26] mikemccand: we certainly don't need to support thist oday
[10:26] ThetaPhi_: and I had no good test
[10:27] ThetaPhi_: yes exactly
[10:27] mikemccand: if somehow this ever becomes needed then we add it then
[10:27] mikemccand: the new FTE looks great
[10:27] ThetaPhi_: the idea was to provide a param to nextSeekTerm which 
denotes if the underlying enum ended
[10:27] mikemccand: NRQ is soooo much simpler :)
[10:28] ThetaPhi_: the idea was to call even on exit nextSeekTerm(true), where 
triue meant exhausted
[10:28] mikemccand: i see -- so it involved different API
[10:28] ThetaPhi_: TEnums like NRQ or automaton only going forward then know, 
not to provide seek terms again
[10:28] ThetaPhi_: this was important for Automaton not to fall into endless 
loops
[10:29] mikemccand: this is hairy stuff :)
[10:29] ThetaPhi_: and good for NRQ to not provide any further terms
[10:30] ThetaPhi_: when I tried to implement there was always a problem and you 
were not able to correctly define waht should happen then
[10:30] mikemccand: how come the switch in FTE.next doesn't have "case NO" on 
the return from accept(term)?
[10:31] ThetaPhi_: thats not needed it just falls through and loops to next term
[10:31] ThetaPhi_: you could add it as NO: break;
[10:31] * mikemccand ahh got it -- maybe just add comment saying so
[10:31] ThetaPhi_: good idea, will do
[10:32] ThetaPhi_: the NO_AND_seek case is the same it just says doSeek = true
[10:32] ThetaPhi_: and for yes it returns the term in both cases, but records 
doSeek for the next time next() is called
[10:32] mikemccand: i wonder if we could simply add a seek term to AcceptStatus?
[10:33] mikemccand: vs calling the separate nextSeekTerm method
[10:33] ThetaPhi_: enum's contents are final
[10:33] mikemccand: not sure it'd be better... just wondering
[10:33] ThetaPhi_: because each constant is a singleton, they should be 
unmodifiable
[10:34] mikemccand: ie, fix AcceptStatus to be like TermsEnum.SeekResult
[10:34] mikemccand: so it returns status, but also an optional seekTerm which 
if null means no seeking
[10:34] ThetaPhi_: and how to do this, not with an enum constant
[10:34] ThetaPhi_: because they are constants
[10:35] ThetaPhi_: and createing a new object on each accept call seems not 
ideal
[10:35] mikemccand: sorry, i was confused -- SeekStatus used to be a class that 
had two attrs -- a TermRef, and the enum; i simplified that a while ago
[10:36] ThetaPhi_: aaaah
[10:36] ThetaPhi_: now its an enum
[10:36] ThetaPhi_: i do not think its a ood idea to return new objects
[10:36] mikemccand: right.  for a while I had no TermsEnum.term()
[10:36] mikemccand: the enums would reuse their status object
[10:36] mikemccand: ie API would require that this is fine
[10:37] ThetaPhi_: but enums are singleton
[10:37] ThetaPhi_: for the whole JVM
[10:37] mikemccand: if we did that... then we could go back to YES/NO/END, and, 
and seekTerm is then orthogonal
[10:37] ThetaPhi_: ok
[10:37] ThetaPhi_: so a protected setSeekTerm
[10:38] mikemccand: we'd make a new "AcceptResult" class.  has "AcceptStatus 
status" member, and also "TermRef seekTerm"
[10:38] ThetaPhi_: then we are back at the state before, we should ask robert
[10:38] mikemccand: ok
[10:38] ThetaPhi_: for him the separation of accept and nextSeekTerm was good
[10:38] mikemccand: ahh i see
[10:39] ThetaPhi_: but he could do the calculation of the next sring in accept, 
too
[10:39] mikemccand: though, can't he simply emulate?  ie, internally call a 
private nextSeekTerm, stick it on the returned AcceptResult, and return that?
[10:39] mikemccand: right
[10:39] ThetaPhi_: telephone...
[10:39] mikemccand: do we ever call nextSeekTerm, unless AcceptStatus was 
XXX_AND_SEEK?
[10:39] mikemccand: hah real world interrupts
[10:40] ThetaPhi_: thats what is done currentl
[10:40] ThetaPhi_: brb
[10:41] mikemccand: man all of infra seems slow right now... jira giving me 
intermittent Internal Server Error... svn really slow
[10:41] mikemccand: git is tempting
[11:04] *** jwtoddii has joined #lucene.
[11:15] *** jwtoddii has signed off IRC ().
[11:15] ThetaPhi_: re
[11:15] mikemccand: hello
[11:15] ThetaPhi_: for me infra works normal
[11:16] mikemccand: hmmm
[11:16] ThetaPhi_: but maybe because i am on the euorpe svn
[11:16] mikemccand: you use the eu mirrors?
[11:16] mikemccand: ok
[11:16] ThetaPhi_: but for jira i do not know
[11:16] mikemccand: jira is on/off for me now
[11:16] ThetaPhi_: do not think there is a mirror
[11:17] ThetaPhi_: [10:39] mikemccand: do we ever call nextSeekTerm, unless 
AcceptStatus was XXX_AND_SEEK?
[11:17] ThetaPhi_: no
[11:18] mikemccand: ok
[11:18] ThetaPhi_: and doSeek is always reset to false whenever a seek occured
[11:19] ThetaPhi_: the global doSeek is just for the YES_AND_SEEK case
[11:19] ThetaPhi_: because it exits the loop and returns the term
[11:19] ThetaPhi_: so on the next call to next() doSeek is true and we seek
[11:20] mikemccand: got it
[11:21] ThetaPhi_: an idea about the nextSeekTerm:
[11:21] ThetaPhi_: we add a protected setSeekTerm(TermRef)
[11:21] ThetaPhi_: this sets a private var
[11:22] ThetaPhi_: accept can set this and return YES or no wahtever
[11:22] ThetaPhi_: in the case of END it is ignored
[11:22] ThetaPhi_: but when next() is then called again, if this internal seek 
varaible is != null it seeks, else it goes forward with next
[11:22] ThetaPhi_: and after seek it is set to null
[11:23] ThetaPhi_: and we could add an assert inside this method to check if 
only seeking forward
[11:24] ThetaPhi_: to inform implementors of errors
[11:24] ThetaPhi_: assert termComp.compare(seekTerm, term()) > 0;
[11:24] mikemccand: why not simply return the optional seek term along w/ 
AcceptStatus?  it'd require no additional methods, and makes it clear that 
seeking part of accepting. 
[11:25] mikemccand: ie the existince of new methods in the api (setSeekTerm, 
nextSeekTerm) enrich the api -- make you wonder when you can call them.  eg can 
ctor call setSeekTerm?
[11:25] ThetaPhi_: because accept status is an enum constant which is final, 
singleton and 
[11:25] ThetaPhi_: yes ctor can do it at the moment
[11:25] ThetaPhi_: its already there its only called setinitialseekterm
[11:26] ThetaPhi_: (see docs)
[11:26] mikemccand: i mean make a new class (AcceptResult), like i described.  
it contains an AcceptResult.Status (the enum), and a TermRef.
[11:26] ThetaPhi_: the problem is lots of object creation
[11:26] mikemccand: there would be no object creation -- we'd reuse?
[11:26] ThetaPhi_: in my opinion setSeekTerm is simplier to use
[11:27] ThetaPhi_: it would be the same like the reuse case
[11:27] ThetaPhi_: and setseekterm can also be used in ctor
[11:27] mikemccand: yeah the fact that ctor needs to set initial term does seem 
to require the extra method
[11:28] mikemccand: though i also don't like side-effect methods -- you change 
the internal state of the class, vs returning the seek term
[11:28] mikemccand: ie it makes the api stateful, which except for the initial 
case, is overkill
[11:29] ThetaPhi_: its an iterator it always has an internal state
[11:29] ThetaPhi_: i understand your problem
[11:29] ThetaPhi_: you call in a method that should only accept something a 
method that changes state
[11:30] mikemccand: the internal state is handled by FTE; my subclass is mostly 
stateless, if i return the seek term
[11:30] ThetaPhi_: so you cannot call it from outside (which would not wrok, 
because it is protected)
[11:30] ThetaPhi_: NRQ also has a state and also automaton
[11:31] ThetaPhi_: in NRQ it is the linkedlist with seek terms
[11:31] ThetaPhi_: in Automaton the nextString()
[11:31] mikemccand: right, they have their own state because of how they 
iterate.  it's just that seek term need not be stateful.
[11:32] ThetaPhi_: i think we should ask robert, at the moment i would know how 
to change his code to support both cases
[11:33] mikemccand: in fact, instead of setInitialSeekTerm, could we have 
getInitialSeekTerm?  ie FTE would invoke that once on start
[11:33] mikemccand: ok :)
[11:33] mikemccand: automaton is scary
[11:33] mikemccand: and, powerful
[11:33] ThetaPhi_: the getInitialSeek term in the current api is hidden behind 
the iterator
[11:34] ThetaPhi_: setInitialSeekTerm for the ctor is just a convenience
[11:34] ThetaPhi_: to prevent subclasses like prefixtermsenum from overrideing 
and implementing the singleton itertor
[11:34] * mikemccand w/ getInitialSeekTerm, there is no sneaky shared state w/ 
FTE.  i mean, FTE keeps track of its state, and each subclass tracks its own 
state.  subclassing wouldn't even be necessary anymore -- one could provdie a 
standonalone TermsEnumFilter, that just has getInitialSeekTerm and accept
[11:34] ThetaPhi_: (see docs)
[11:36] mikemccand: in general I don't like subclassing APIs -- it's a bigger 
surface area (how child works w/ parent, stateful methods, when can i call each 
method, etc.)
[11:36] ThetaPhi_: we could remove that also in the current api
[11:37] ThetaPhi_: only abstract classes should be subclassable
[11:37] ThetaPhi_: thats asked very often
[11:37] mikemccand: gonna be a cold run this AM -- 28F out there
[11:37] ThetaPhi_: *calculating*
[11:37] mikemccand: :)
[11:37] mikemccand: invert 32 + 9/5 * C
[11:38] ThetaPhi_: or ask google
[11:38] ThetaPhi_: -2.22 °C 
[11:39] ThetaPhi_: (Link: 
http://www.google.de/search?hl=de&safe=off&q=28+fahrenheit+in+celsius&meta=&aq=f&oq=)http://www.google.de/search?hl=de&safe=off&q=28+fahrenheit+in+celsius&meta=&aq=f&oq=
[11:39] mikemccand: if i jump in and see that, to use FTE i only have to 
implement to an interface (TermsEnumFilter) that has getInitialTerm/accept... I 
think that's more approachable than figuring out the relationship  (methods, 
state) to an abstract parent class
[11:39] mikemccand: nice
[11:40] mikemccand: ugh, zillions of conflicts on backporting thread safe 
spellchecker
[11:40] ThetaPhi_: if the abstract parent class makes all other methods final 
its the same
[11:41] ThetaPhi_: but ok, you could change MTQ to just return the interface
[11:41] mikemccand: not really the same -- you have to define when each 
overridden method is allowed/supposed to invoke the methods from the parent
[11:42] mikemccand: once (if) we make the interaction stateless,we suddenly 
have the freedom to make it a separate interface...
[11:43] ThetaPhi_: you mean calling next() inside accept() -> loops forever
[11:43] mikemccand: yeah :(
[11:43] mikemccand: of course... that freedom of the orig FilteredTermEnum is 
what made NRQ sneakiness possible in the first place ;)
[11:44] ThetaPhi_: (which was a hack, you have seen the comment: something 
like: this relys on how setEnum() works, if this changes in the superclass this 
enum will no longer work)
[11:45] ThetaPhi_: this was a comment before i removed the recursion in one of 
this issues
[11:45] mikemccand: right -- but it was the right hting to do at the time
[11:45] mikemccand: sure was tricky to undertsand :)
[11:45] ThetaPhi_: (ore remove the recursion, like it is now in trunk)
[11:45] mikemccand: you were bound by FTE
[11:45] mikemccand: yes, that's better
[11:45] mikemccand: but i like separate interface best -- it removes all shared 
state -- one thing describes what's filtered, the other implements according to 
that
[11:46] mikemccand: then MTQ could almost simply accept a TermsEnum.  the only 
difference is difference() -- ha
[11:46] mikemccand: it's only fuzzy that uses difference() right?
[11:46] ThetaPhi_: yes
[11:46] ThetaPhi_: i was thinking about that
[11:47] ThetaPhi_: because a empty prefix filter could simply return the 
termsenum
[11:47] mikemccand: ahh yes nice opto
[11:47] ThetaPhi_: PrefixQuery("")
[11:47] ThetaPhi_: would match all documents that have at least any term
[11:47] mikemccand: actually how come contrib/queries' TermsFilter isn't a 
query filter wrapper around an MTQ?
[11:47] ThetaPhi_: in automaton we need this for the catch all .* case
[11:47] mikemccand: ahh
[11:48] ThetaPhi_: at the moment it generates a PrefixTermsEnum("")
[11:48] mikemccand: so maybe AcceptResult has a float difference?
[11:48] ThetaPhi_: which is slower than just returning the TermsEnum of the 
reader, just because of difference
[11:49] ThetaPhi_: because it calls startsWith for each term (which in fact 
does nothing because termref.length==0
[11:49] mikemccand: ugh
[11:49] ThetaPhi_: so not that bad
[11:49] mikemccand: much better to simple return .terms() for field.  or, 
rewrite to MatchAllDocs
[11:49] ThetaPhi_: or a termrangequery(null,null) (often used in solr)
[11:49] ThetaPhi_: no
[11:49] ThetaPhi_: MatchAllDocs is different
[11:50] ThetaPhi_: because the Prefix("") case only returns document that have 
at least one term of that field
[11:50] mikemccand: ahhh yes
[11:50] ThetaPhi_: this is needed for facetting i think in Solr
[11:50] ThetaPhi_: not sure
[11:50] mikemccand: got it
[11:50] ThetaPhi_: in NRQ its fast, it just enumerates all low-prec terms
[11:51] ThetaPhi_: 16 for precStep=4
[11:51] mikemccand: nice
[11:51] mikemccand: NRQ is a great step forward for lucene
[11:51] mikemccand: would be nice if MTQ could simply accept a TermsEnum
[11:52] mikemccand: problem is difference()
[11:52] ThetaPhi_: yes
[11:52] ThetaPhi_: and is only used by fuzzy
[11:52] mikemccand: yes, annoying for just that one case
[11:52] mikemccand: contrib's TermsFilter really ought to be an MTQ
[11:52] ThetaPhi_: because robert asked for a alltermsenum, which is stupied
[11:53] mikemccand: yeah
[11:53] ThetaPhi_: i only said, user prefix("") whoich is ok, the overhead is 
minimal, but in this case it would be zero
[11:53] mikemccand: we could allow MTQ to accept either a TersmsEnum, or, a 
TermsEnumFilter (this new stateless interface, with AcceptResult also having a 
float difference field)
[11:53] ThetaPhi_: aaaah
[11:54] ThetaPhi_: but that looks like attributesource
[11:54] ThetaPhi_: differenceattribute
[11:54] mikemccand: yeah that's true
[11:54] ThetaPhi_: termsenums are already with attributes
[11:54] mikemccand: curious.  so then MTQ could accept only TermsEnum, but, ask 
for its attrs, and if that's non-null, as for differenceattr, and if htat's non 
null, use it
[11:55] ThetaPhi_: fuzzyenum just add this attribute
[11:55] mikemccand: i like that!
[11:55] ThetaPhi_: and fuzzyquery requests it
[11:55] mikemccand: and MTQ respects it too
[11:55] ThetaPhi_: and the default of this attribute is 1.0
[11:55] mikemccand: ye
[11:55] mikemccand: yes
[11:55] ThetaPhi_: so mtq just asks always for this attribute in scoring 
boolean rewrite
[11:56] mikemccand: yes
[11:56] mikemccand: and, MTQ is much simplified to accept a TermsEnum for its 
terms
[11:56] ThetaPhi_: yes
[11:56] mikemccand: and we can make a stateless API for filtering a TermsEnum
[11:57] ThetaPhi_: by the way, I did not check ever<ything in flex
[11:57] mikemccand: that's ok -- it's an immense number of changes!
[11:57] ThetaPhi_: but if you wrap an enum, you must override attributes() to 
return the attributes of the delegate
[11:57] mikemccand: i still keep finding bugs in the emulation layers.
[11:57] mikemccand: hmm you're right
[11:58] ThetaPhi_: so UnionTermsEnum or like so must override attributes() and 
call delegate.attributes()
[11:58] mikemccand: or, fuzzy query could be the only one that does htis
[11:58] ThetaPhi_: yes
[11:58] ThetaPhi_: no
[11:58] ThetaPhi_: let it in MTQ
[11:58] ThetaPhi_: we yesterday talked, normally rewrite should be final in MTQ
[11:58] mikemccand: that sounds good
[11:58] ThetaPhi_: you should only change behaviour in getEnum
[11:58] mikemccand: ok
[11:59] ThetaPhi_: in trunk we currently rewrite WildCardQuery to PrefixQuery 
which is itsself an MTQ
[11:59] ThetaPhi_: totally useless
[11:59] ThetaPhi_: just return PrefixEnum in getEnum
[11:59] mikemccand: ahh right
[12:00] ThetaPhi_: in flex i changed
[12:00] ThetaPhi_: the problem are bw tests in trunk, because they check the 
rewritten thing, but we could simply remove the tests
[12:01] mikemccand: tests shouldn't check internals like that
[12:01] ThetaPhi_: yes
[12:01] ThetaPhi_: or these checks should be marked
[12:01] mikemccand: yes
[12:01] ThetaPhi_: in Junit4 with an annotation
[12:01] mikemccand: ahh what annotation?
[12:01] ThetaPhi_: and the bw tests only run tests without that annotation
[12:02] mikemccand: that sounds great
[12:02] ThetaPhi_: the problem may then still compilation
[12:03] ThetaPhi_: if it checks package protected fields and so on
[12:03] ThetaPhi_: brrr
[12:03] ThetaPhi_: we have such tests
[12:03] mikemccand: sigh.
[12:03] mikemccand: yes
[12:03] mikemccand: i remember committing them ;)
[12:03] *** mikemccand has left #lucene.
[12:03] ThetaPhi_: you always have to remove them
[12:22] *** mikemccand has joined #lucene.
[12:22] mikemccand: sorry, dropped off
[12:25] ThetaPhi_: ok i said
[12:25] ThetaPhi_: i will try out a little bit
[12:25] ThetaPhi_: and post in the automaton
[12:25] mikemccand: ok
[12:25] ThetaPhi_: its hard to always generate both pacthes
[12:26] ThetaPhi_: when we are happy, i will post a patch to 2110 and then we 
commit
[12:26] ThetaPhi_: in automaton there is more special cases to test
{quote}

> Change FilteredTermsEnum to work like Iterator, so it is not positioned and 
> next() must be always called first. Remove empty()
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2110
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2110
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2110.patch, LUCENE-2110.patch, LUCENE-2110.patch, 
> LUCENE-2110.patch
>
>
> FilteredTermsEnum is confusing as it is initially positioned to the first 
> term. It should instead work like an uninitialized TermsEnum for a field 
> before the first call to next() or seek().
> Also document that not all FilteredTermsEnums may implement seek() as eg. NRQ 
> or Automaton are not able to support this. Seeking is also not needed for MTQ 
> at all, so seek can just throw UOE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to