Thanks.  Switching to SimpleQueryParser does resolve the exceptions, before I was using classic.QueryParser/MultiFieldQueryParser.  The only difference is the tilde values.  I guess these are now integers.

On 19/08/2024 12:32, Uwe Schindler wrote:
Hi,

Basically, my only recommendation is to NOT use the standard query parser in code that is useable from external users. Lucene has multiple query parsers, the standard one is a strict one and has a stronmg syntax and is therefor not targeted at end-users. This also applies to Solr: Use dismax instead of Lucene standard parser there! As the standard parser also allows to pass any field name, its is also risky to get security issues (people searching on arbitrary fields that were not intended to be public). In short: See the default Lucene query parser as vulnerable to something like "SQL injection attacks" (just replace "SQL" with "Lucene Query Syntax").

For end-user queries I recommend to use the "SimpleQueryParser", which throws no exceptions and you can configure which syntax features should be enabled and which not. It also does not allow users to pass field names, so it is basically written for the use case "user enters some search terms without syntax knowledge".

If you want to apply hardcoded filters never every construct plain string queries like that, they are always vulnerable to "SQL injection" issues. Pass the user-entered queries to SimpleQueryParser or don't parse any syntax at all and instead use "match" queries (e.g., with Elasticsearch/Opensearch - available in QueryBuilder class, e.g.. QueryBuilder#createBooleanQuery). In addition, your hardcoded filters are created as native Query instances (like TermQuery, PhraseQuery,....). Finally combine the different queries constructed with BooleanQuery together and pass the complex result to IndexSearcher. For programmatic query construction the Query subclasses are the way to go.

Uwe

Am 11.08.2024 um 10:37 schrieb Greg Huber:
Looking through my httpd logs I see lots of searches as such

/devbox/search?q=%29%20AND%203318%3D4385%20AND%20%287778%3D7778

ie : ) AND 3318=4385 AND (7778=7778

guess they might be fishing for something.

For the fuzzy search I use a different distance values and the default is  ~0.6

String distance = "~0.6";

.........

bQueryF.add(queryParser.parse(QueryParser.escape(term)
                                        .replace("~", "") + distance),
                                BooleanClause.Occur.MUST);


From the query string above I get the error shown below.  I added QueryParser.escape(term) but maybe this does nothing here.  Is there a way to escape these or configure lecene just to return no results rather than an exception.

2024-08-11 09:19:05,427 ERROR me.search.operations.SearchMe SearchMe:doRun - Error searching index

org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'AND~0.6': Encountered " <AND> "AND "" at line 1, column 0.

Was expecting one of:

<NOT> ...

"+" ...

"-" ...

<BAREOPER> ...

"(" ...

"*" ...

<QUOTED> ...

<TERM> ...

<PREFIXTERM> ...

<WILDTERM> ...

<REGEXPTERM> ...

"[" ...

"{" ...

<NUMBER> ...

<TERM> ...

at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:141) ~[lucene-queryparser-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]

at org.events.business.search.operations.SearchOperation.doRun(SearchOperation.java:202) [classes/:?]

at org.events.business.search.operations.ReadFromIndexOperation.run(ReadFromIndexOperation.java:29) [classes/:?]

at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]

at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]

Thanks

Reply via email to