Thanks. Switching to SimpleQueryParser does resolve the exceptions,
before I was using classic.QueryParser/MultiFieldQueryParser. The only
difference is the tilde values. I guess these are now integers.
On 19/08/2024 12:32, Uwe Schindler wrote:
Hi,
Basically, my only recommendation is to NOT use the standard query
parser in code that is useable from external users. Lucene has
multiple query parsers, the standard one is a strict one and has a
stronmg syntax and is therefor not targeted at end-users. This also
applies to Solr: Use dismax instead of Lucene standard parser there!
As the standard parser also allows to pass any field name, its is also
risky to get security issues (people searching on arbitrary fields
that were not intended to be public). In short: See the default Lucene
query parser as vulnerable to something like "SQL injection attacks"
(just replace "SQL" with "Lucene Query Syntax").
For end-user queries I recommend to use the "SimpleQueryParser", which
throws no exceptions and you can configure which syntax features
should be enabled and which not. It also does not allow users to pass
field names, so it is basically written for the use case "user enters
some search terms without syntax knowledge".
If you want to apply hardcoded filters never every construct plain
string queries like that, they are always vulnerable to "SQL
injection" issues. Pass the user-entered queries to SimpleQueryParser
or don't parse any syntax at all and instead use "match" queries
(e.g., with Elasticsearch/Opensearch - available in QueryBuilder
class, e.g.. QueryBuilder#createBooleanQuery). In addition, your
hardcoded filters are created as native Query instances (like
TermQuery, PhraseQuery,....). Finally combine the different queries
constructed with BooleanQuery together and pass the complex result to
IndexSearcher. For programmatic query construction the Query
subclasses are the way to go.
Uwe
Am 11.08.2024 um 10:37 schrieb Greg Huber:
Looking through my httpd logs I see lots of searches as such
/devbox/search?q=%29%20AND%203318%3D4385%20AND%20%287778%3D7778
ie : ) AND 3318=4385 AND (7778=7778
guess they might be fishing for something.
For the fuzzy search I use a different distance values and the
default is ~0.6
String distance = "~0.6";
.........
bQueryF.add(queryParser.parse(QueryParser.escape(term)
.replace("~", "") + distance),
BooleanClause.Occur.MUST);
From the query string above I get the error shown below. I added
QueryParser.escape(term) but maybe this does nothing here. Is there
a way to escape these or configure lecene just to return no results
rather than an exception.
2024-08-11 09:19:05,427 ERROR me.search.operations.SearchMe
SearchMe:doRun - Error searching index
org.apache.lucene.queryparser.classic.ParseException: Cannot parse
'AND~0.6': Encountered " <AND> "AND "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:141)
~[lucene-queryparser-9.11.1.jar:9.11.1
0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
at
org.events.business.search.operations.SearchOperation.doRun(SearchOperation.java:202)
[classes/:?]
at
org.events.business.search.operations.ReadFromIndexOperation.run(ReadFromIndexOperation.java:29)
[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[?:?]
Thanks