Hi,
Basically, my only recommendation is to NOT use the standard query
parser in code that is useable from external users. Lucene has multiple
query parsers, the standard one is a strict one and has a stronmg syntax
and is therefor not targeted at end-users. This also applies to Solr:
Use dismax instead of Lucene standard parser there! As the standard
parser also allows to pass any field name, its is also risky to get
security issues (people searching on arbitrary fields that were not
intended to be public). In short: See the default Lucene query parser as
vulnerable to something like "SQL injection attacks" (just replace "SQL"
with "Lucene Query Syntax").
For end-user queries I recommend to use the "SimpleQueryParser", which
throws no exceptions and you can configure which syntax features should
be enabled and which not. It also does not allow users to pass field
names, so it is basically written for the use case "user enters some
search terms without syntax knowledge".
If you want to apply hardcoded filters never every construct plain
string queries like that, they are always vulnerable to "SQL injection"
issues. Pass the user-entered queries to SimpleQueryParser or don't
parse any syntax at all and instead use "match" queries (e.g., with
Elasticsearch/Opensearch - available in QueryBuilder class, e.g..
QueryBuilder#createBooleanQuery). In addition, your hardcoded filters
are created as native Query instances (like TermQuery,
PhraseQuery,....). Finally combine the different queries constructed
with BooleanQuery together and pass the complex result to IndexSearcher.
For programmatic query construction the Query subclasses are the way to go.
Uwe
Am 11.08.2024 um 10:37 schrieb Greg Huber:
Looking through my httpd logs I see lots of searches as such
/devbox/search?q=%29%20AND%203318%3D4385%20AND%20%287778%3D7778
ie : ) AND 3318=4385 AND (7778=7778
guess they might be fishing for something.
For the fuzzy search I use a different distance values and the default
is ~0.6
String distance = "~0.6";
.........
bQueryF.add(queryParser.parse(QueryParser.escape(term)
.replace("~", "") + distance),
BooleanClause.Occur.MUST);
From the query string above I get the error shown below. I added
QueryParser.escape(term) but maybe this does nothing here. Is there a
way to escape these or configure lecene just to return no results
rather than an exception.
2024-08-11 09:19:05,427 ERROR me.search.operations.SearchMe
SearchMe:doRun - Error searching index
org.apache.lucene.queryparser.classic.ParseException: Cannot parse
'AND~0.6': Encountered " <AND> "AND "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:141)
~[lucene-queryparser-9.11.1.jar:9.11.1
0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
at
org.events.business.search.operations.SearchOperation.doRun(SearchOperation.java:202)
[classes/:?]
at
org.events.business.search.operations.ReadFromIndexOperation.run(ReadFromIndexOperation.java:29)
[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[?:?]
Thanks
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org