+1 please fix the QP bug. It should only identify query keywords and 
non-keywords. 



On Oct 26, 2011, at 8:09 AM, Robert Muir <[email protected]> wrote:

> Use a queryparser that doesnt break on whitespace as a workaround?
> Or, we can start thinking about how to fix QueryParser
> (https://issues.apache.org/jira/browse/LUCENE-2605)
> 
> The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace.
> Allowing tokenizer access to the query string would just mean that
> your tokenizer hacks around this by trying to be a QueryParser, too,
> making matters even worse!
> 
> 
> On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling
> <[email protected]> wrote:
>> OK, I think "query string" is a bit to specific, so more general
>> what I need is access from inside of a filter to the complete string
>> (not only token) being analyzed.
>> 
>> A very dirty workaround would be a "collector filter" which collects all
>> tokens after WhitespaceTokenizer and makes it somehow available for
>> the following filters, or not?
>> So at least at the last run of incrementToken() I have the original string.
>> 
>> Bernd
>> 
>> Am 26.10.2011 10:26, schrieb Uwe Schindler:
>>> 
>>> The input from StringReader does not help you:
>>> - in the case of QueryParser it is *not* the query string!!!
>>> - storing it in an attribute would blow up your heap for real documents
>>> 
>>> Uwe
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: [email protected]
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Bernd Fehling [mailto:[email protected]]
>>>> Sent: Wednesday, October 26, 2011 10:06 AM
>>>> To: [email protected]
>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>> 
>>>>  From what I can see in the debugger the analyzer chain is implemented as
>>> 
>>> a
>>>> 
>>>> stack with last filter at the bottom and the first filter at the top.
>>>> 
>>>> An analyzer query chain of:
>>>> charFilter: MappingCharFilterFactory
>>>> tokenizer : WhitespaceTokenizerFactory
>>>> filter    : PatternReplaceFilterFactory
>>>> filter    : LowerCaseFilterFactory
>>>> filter    : ShingleFilterFactory
>>>> filter    : SynonymFilterFactory
>>>> 
>>>> has a chain of:
>>>> this.input(SynonymFilter) -->  input(ShingleFilter) -->
>>>> input(LowerCaseFilter) -->  input(PatternReplaceFilter) -->
>>>> input(WhitespaceTokenizer) -->  input(MappingCharFilter) -->
>>>> input(CharReader) -->  input(StringReader).str
>>>> 
>>>> So I can always "see" the input of StringReader, but can I access it?
>>>> 
>>>> Bernd
>>>> 
>>>> Am 26.10.2011 09:37, schrieb Chris Male:
>>>>> 
>>>>> We've also lost the full query string by the time the QP creates its
>>>>> TokenStream, right? Because the QP tokenizes on whitespace.
>>>>> 
>>>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<[email protected]>   wrote:
>>>>> 
>>>>>> Hi Simon,
>>>>>> 
>>>>>> The problem is the xchanged consumer/producer role. Once the
>>>>>> TokenStream calls clearAttributes() the attributes are gone, but
>>>>>> query parser can only set the attribute *before* calling
>>>>>> incrementToken(), so you have no chance to get them, as Tokenizer
>>>>>> cleared it before any filter can read it (unless we use an attribute
>>>>>> with clear() a no-op, which would fail lots of tests, as it's a hack).
>>>>>> 
>>>>>> Uwe
>>>>>> 
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: [email protected]
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Simon Willnauer [mailto:[email protected]]
>>>>>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>>>> 
>>>>>>> What Uwe says is correct though. What we possibly could do is adding
>>>>>>> a queryattribute that is set in a query parser (you can do that
>>>>>>> yourself
>>>>>> 
>>>>>> though).
>>>>>>> 
>>>>>>> not sure if it is worth it and if we should do it.
>>>>>>> 
>>>>>>> simon
>>>>>>> 
>>>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<[email protected]>
>>>> 
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> QueryParser and TokenStreams are clearly separated, there is no way
>>>>>>>> to get the query string from inside a TokenStream (and there cannot
>>>>>>>> be, because QP is a consumer of the TS, which is used not only for
>>>>>>>> query parsing). The only chance you have is to use a ThreadLocal
>>>>>>>> that you set before the query is parsed and then use it in the
>>> 
>>> TokenFilter.
>>>>>>>> 
>>>>>>>> Uwe
>>>>>>>> 
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>>>>> eMail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bernd Fehling [mailto:[email protected]]
>>>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>>>>>> To: [email protected]
>>>>>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>>>>> 
>>>>>>>>> Dear list,
>>>>>>>>> while writing some TokenFilter for my analyzer chain I need access
>>> 
>>> to
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>> 
>>>>>>>> query
>>>>>>>>> 
>>>>>>>>> string from inside of my TokenFilter for some comparison, but the
>>>>>>>>> Filters
>>>>>>>> 
>>>>>>>> are
>>>>>>>>> 
>>>>>>>>> working with a TokenStream and get seperate Tokens.
>>>>>>>>> Currently I couldn't get any access to the query string.
>>>>>>>>> 
>>>>>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>>>>> 
>>>>>>>>> Should I write a jira issue for it or is there somewhere a wish
>>> 
>>> list?
>>>>>>>>> 
>>>>>>>>> Best regards
>>>>>>>>> Bernd
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected] For
>>> 
>>> additional
>>>>>>> 
>>>>>>> commands, e-mail: [email protected]
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> *************************************************************
>>>> Bernd Fehling                Universitätsbibliothek Bielefeld
>>>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>>>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>>>> [email protected]                33615 Bielefeld
>>>> 
>>>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>>>> *************************************************************
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>> 
>> --
>> *************************************************************
>> Bernd Fehling                Universitätsbibliothek Bielefeld
>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>> [email protected]                33615 Bielefeld
>> 
>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>> *************************************************************
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
> 
> 
> -- 
> lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to