+1 please fix the QP bug. It should only identify query keywords and non-keywords.
On Oct 26, 2011, at 8:09 AM, Robert Muir <[email protected]> wrote: > Use a queryparser that doesnt break on whitespace as a workaround? > Or, we can start thinking about how to fix QueryParser > (https://issues.apache.org/jira/browse/LUCENE-2605) > > The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace. > Allowing tokenizer access to the query string would just mean that > your tokenizer hacks around this by trying to be a QueryParser, too, > making matters even worse! > > > On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling > <[email protected]> wrote: >> OK, I think "query string" is a bit to specific, so more general >> what I need is access from inside of a filter to the complete string >> (not only token) being analyzed. >> >> A very dirty workaround would be a "collector filter" which collects all >> tokens after WhitespaceTokenizer and makes it somehow available for >> the following filters, or not? >> So at least at the last run of incrementToken() I have the original string. >> >> Bernd >> >> Am 26.10.2011 10:26, schrieb Uwe Schindler: >>> >>> The input from StringReader does not help you: >>> - in the case of QueryParser it is *not* the query string!!! >>> - storing it in an attribute would blow up your heap for real documents >>> >>> Uwe >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: [email protected] >>> >>> >>>> -----Original Message----- >>>> From: Bernd Fehling [mailto:[email protected]] >>>> Sent: Wednesday, October 26, 2011 10:06 AM >>>> To: [email protected] >>>> Subject: Re: accessing the query string from inside TokenFilter >>>> >>>> From what I can see in the debugger the analyzer chain is implemented as >>> >>> a >>>> >>>> stack with last filter at the bottom and the first filter at the top. >>>> >>>> An analyzer query chain of: >>>> charFilter: MappingCharFilterFactory >>>> tokenizer : WhitespaceTokenizerFactory >>>> filter : PatternReplaceFilterFactory >>>> filter : LowerCaseFilterFactory >>>> filter : ShingleFilterFactory >>>> filter : SynonymFilterFactory >>>> >>>> has a chain of: >>>> this.input(SynonymFilter) --> input(ShingleFilter) --> >>>> input(LowerCaseFilter) --> input(PatternReplaceFilter) --> >>>> input(WhitespaceTokenizer) --> input(MappingCharFilter) --> >>>> input(CharReader) --> input(StringReader).str >>>> >>>> So I can always "see" the input of StringReader, but can I access it? >>>> >>>> Bernd >>>> >>>> Am 26.10.2011 09:37, schrieb Chris Male: >>>>> >>>>> We've also lost the full query string by the time the QP creates its >>>>> TokenStream, right? Because the QP tokenizes on whitespace. >>>>> >>>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<[email protected]> wrote: >>>>> >>>>>> Hi Simon, >>>>>> >>>>>> The problem is the xchanged consumer/producer role. Once the >>>>>> TokenStream calls clearAttributes() the attributes are gone, but >>>>>> query parser can only set the attribute *before* calling >>>>>> incrementToken(), so you have no chance to get them, as Tokenizer >>>>>> cleared it before any filter can read it (unless we use an attribute >>>>>> with clear() a no-op, which would fail lots of tests, as it's a hack). >>>>>> >>>>>> Uwe >>>>>> >>>>>> ----- >>>>>> Uwe Schindler >>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>> http://www.thetaphi.de >>>>>> eMail: [email protected] >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Simon Willnauer [mailto:[email protected]] >>>>>>> Sent: Wednesday, October 26, 2011 9:21 AM >>>>>>> To: [email protected] >>>>>>> Subject: Re: accessing the query string from inside TokenFilter >>>>>>> >>>>>>> What Uwe says is correct though. What we possibly could do is adding >>>>>>> a queryattribute that is set in a query parser (you can do that >>>>>>> yourself >>>>>> >>>>>> though). >>>>>>> >>>>>>> not sure if it is worth it and if we should do it. >>>>>>> >>>>>>> simon >>>>>>> >>>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<[email protected]> >>>> >>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> QueryParser and TokenStreams are clearly separated, there is no way >>>>>>>> to get the query string from inside a TokenStream (and there cannot >>>>>>>> be, because QP is a consumer of the TS, which is used not only for >>>>>>>> query parsing). The only chance you have is to use a ThreadLocal >>>>>>>> that you set before the query is parsed and then use it in the >>> >>> TokenFilter. >>>>>>>> >>>>>>>> Uwe >>>>>>>> >>>>>>>> ----- >>>>>>>> Uwe Schindler >>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de >>>>>>>> eMail: [email protected] >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Bernd Fehling [mailto:[email protected]] >>>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM >>>>>>>>> To: [email protected] >>>>>>>>> Subject: accessing the query string from inside TokenFilter >>>>>>>>> >>>>>>>>> Dear list, >>>>>>>>> while writing some TokenFilter for my analyzer chain I need access >>> >>> to >>>>>>>>> >>>>>>>>> the >>>>>>>> >>>>>>>> query >>>>>>>>> >>>>>>>>> string from inside of my TokenFilter for some comparison, but the >>>>>>>>> Filters >>>>>>>> >>>>>>>> are >>>>>>>>> >>>>>>>>> working with a TokenStream and get seperate Tokens. >>>>>>>>> Currently I couldn't get any access to the query string. >>>>>>>>> >>>>>>>>> It would be great to have such a funtionality in lucene/solr. >>>>>>>>> >>>>>>>>> Should I write a jira issue for it or is there somewhere a wish >>> >>> list? >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> Bernd >>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>> additional commands, e-mail: [email protected] >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] For >>> >>> additional >>>>>>> >>>>>>> commands, e-mail: [email protected] >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [email protected] >>>>>> For additional commands, e-mail: [email protected] >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> ************************************************************* >>>> Bernd Fehling Universitätsbibliothek Bielefeld >>>> Dipl.-Inform. (FH) Universitätsstr. 25 >>>> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >>>> [email protected] 33615 Bielefeld >>>> >>>> BASE - Bielefeld Academic Search Engine - www.base-search.net >>>> ************************************************************* >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> -- >> ************************************************************* >> Bernd Fehling Universitätsbibliothek Bielefeld >> Dipl.-Inform. (FH) Universitätsstr. 25 >> Tel. +49 521 106-4060 Fax. +49 521 106-4052 >> [email protected] 33615 Bielefeld >> >> BASE - Bielefeld Academic Search Engine - www.base-search.net >> ************************************************************* >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
