Re: [CLucene-dev] Custom Filter

Veit Jahns Thu, 01 Dec 2011 23:30:04 -0800

Hi Ahmed!

2011/12/1 Ahmed Saidi <[email protected]>:
> Hi Viet,
> i want to build a filter to split tokens that contain characters
> [a-zA-Z] and numbers into two or more tokens
> for example if this filter got a token like "test123" it will split it
> into two tokens "test" and "123", and it will split "ci7nucha" to
> "ci", "7" and "nusha"
> My implementation does that, but rather than converting splited tokens
> into Term query, it convert them into PhraseQuery
> I want to build a filter like solr WordDelimiterFilter


But then, I think, you have to extend/change the QueryParser. If a
field query is parsed than it depends on the "tokinzation" of the
text. If the tokenization results in one term then the outcome of
parsing the field query is a TermQuery. If not, as in your case, then
the outcome is either a BooleanQuery consisting of several TermQueries
or a PhraseQuery (cf. QueryParser::getFieldQuery(const TCHAR* _field,
TCHAR* queryText) in QueryParser.cpp). I think, you need the
BooleanQuery as the outcome, right? This seems to depend on the
position increment return by the filter. But haven't understand this
completely yet.

Kind regards,

Veit

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Re: [CLucene-dev] Custom Filter

Reply via email to