Thanks Robert for pointing me to the issue. Thats exactly my problem
because I'm trying to implement "query time synonym expansion".
Therefore it is nessessary to "cleanup" the synonym result with help
of the query string.

Interestingly my FAST system calls synonym twice for query parsing:
...
synonym
parse
synonym
...

Would be pleased to have this fixed so that QueryParser is not also
a tokenizer, but while having looked into QueryParser (which scared
me to death) is it possible to be fixed at all without getting any
other bad side effects?

Using phrase query works so far for getting the complete query string
at once to the analyzer.


Am 26.10.2011 14:09, schrieb Robert Muir:
Use a queryparser that doesnt break on whitespace as a workaround?
Or, we can start thinking about how to fix QueryParser
(https://issues.apache.org/jira/browse/LUCENE-2605)

The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace.
Allowing tokenizer access to the query string would just mean that
your tokenizer hacks around this by trying to be a QueryParser, too,
making matters even worse!


On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling
<[email protected]>  wrote:
OK, I think "query string" is a bit to specific, so more general
what I need is access from inside of a filter to the complete string
(not only token) being analyzed.

A very dirty workaround would be a "collector filter" which collects all
tokens after WhitespaceTokenizer and makes it somehow available for
the following filters, or not?
So at least at the last run of incrementToken() I have the original string.

Bernd

Am 26.10.2011 10:26, schrieb Uwe Schindler:

The input from StringReader does not help you:
- in the case of QueryParser it is *not* the query string!!!
- storing it in an attribute would blow up your heap for real documents

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


-----Original Message-----
From: Bernd Fehling [mailto:[email protected]]
Sent: Wednesday, October 26, 2011 10:06 AM
To: [email protected]
Subject: Re: accessing the query string from inside TokenFilter

  From what I can see in the debugger the analyzer chain is implemented as

a

stack with last filter at the bottom and the first filter at the top.

An analyzer query chain of:
charFilter: MappingCharFilterFactory
tokenizer : WhitespaceTokenizerFactory
filter    : PatternReplaceFilterFactory
filter    : LowerCaseFilterFactory
filter    : ShingleFilterFactory
filter    : SynonymFilterFactory

has a chain of:
this.input(SynonymFilter) -->    input(ShingleFilter) -->
input(LowerCaseFilter) -->    input(PatternReplaceFilter) -->
input(WhitespaceTokenizer) -->    input(MappingCharFilter) -->
input(CharReader) -->    input(StringReader).str

So I can always "see" the input of StringReader, but can I access it?

Bernd

Am 26.10.2011 09:37, schrieb Chris Male:

We've also lost the full query string by the time the QP creates its
TokenStream, right? Because the QP tokenizes on whitespace.

On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<[email protected]>     wrote:

Hi Simon,

The problem is the xchanged consumer/producer role. Once the
TokenStream calls clearAttributes() the attributes are gone, but
query parser can only set the attribute *before* calling
incrementToken(), so you have no chance to get them, as Tokenizer
cleared it before any filter can read it (unless we use an attribute
with clear() a no-op, which would fail lots of tests, as it's a hack).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


-----Original Message-----
From: Simon Willnauer [mailto:[email protected]]
Sent: Wednesday, October 26, 2011 9:21 AM
To: [email protected]
Subject: Re: accessing the query string from inside TokenFilter

What Uwe says is correct though. What we possibly could do is adding
a queryattribute that is set in a query parser (you can do that
yourself

though).

not sure if it is worth it and if we should do it.

simon

On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<[email protected]>

wrote:

Hi,

QueryParser and TokenStreams are clearly separated, there is no way
to get the query string from inside a TokenStream (and there cannot
be, because QP is a consumer of the TS, which is used not only for
query parsing). The only chance you have is to use a ThreadLocal
that you set before the query is parsed and then use it in the

TokenFilter.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
eMail: [email protected]


-----Original Message-----
From: Bernd Fehling [mailto:[email protected]]
Sent: Wednesday, October 26, 2011 8:33 AM
To: [email protected]
Subject: accessing the query string from inside TokenFilter

Dear list,
while writing some TokenFilter for my analyzer chain I need access

to

the

query

string from inside of my TokenFilter for some comparison, but the
Filters

are

working with a TokenStream and get seperate Tokens.
Currently I couldn't get any access to the query string.

It would be great to have such a funtionality in lucene/solr.

Should I write a jira issue for it or is there somewhere a wish

list?

Best regards
Bernd


---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected] For
additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For
additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For

additional

commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]





--
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
[email protected]                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


--
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
[email protected]                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]






--
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
[email protected]                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to