Hi,
Am 20.08.2024 um 11:45 schrieb Greg Huber:
Thanks. Switching to SimpleQueryParser does resolve the exceptions,
before I was using classic.QueryParser/MultiFieldQueryParser. The
only difference is the tilde values. I guess these are now integers.
Basically, yes: The text is tokenized using the analyzer, so it splits
on the "~" and produces two tokens: the term like before and the
distance integer as another token. But of course this depends on your
analyzer. SimpleQueryParser also respects a bit of syntax, so the code
is not as simple as descirbed here, but basically that's how fulltext
search works. Take the user entered text and tokenize/analyze it in the
same way like you do on indexing and then find token matches in index
for the query tokens.
Uwe
On 19/08/2024 12:32, Uwe Schindler wrote:
Hi,
Basically, my only recommendation is to NOT use the standard query
parser in code that is useable from external users. Lucene has
multiple query parsers, the standard one is a strict one and has a
stronmg syntax and is therefor not targeted at end-users. This also
applies to Solr: Use dismax instead of Lucene standard parser there!
As the standard parser also allows to pass any field name, its is
also risky to get security issues (people searching on arbitrary
fields that were not intended to be public). In short: See the
default Lucene query parser as vulnerable to something like "SQL
injection attacks" (just replace "SQL" with "Lucene Query Syntax").
For end-user queries I recommend to use the "SimpleQueryParser",
which throws no exceptions and you can configure which syntax
features should be enabled and which not. It also does not allow
users to pass field names, so it is basically written for the use
case "user enters some search terms without syntax knowledge".
If you want to apply hardcoded filters never every construct plain
string queries like that, they are always vulnerable to "SQL
injection" issues. Pass the user-entered queries to SimpleQueryParser
or don't parse any syntax at all and instead use "match" queries
(e.g., with Elasticsearch/Opensearch - available in QueryBuilder
class, e.g.. QueryBuilder#createBooleanQuery). In addition, your
hardcoded filters are created as native Query instances (like
TermQuery, PhraseQuery,....). Finally combine the different queries
constructed with BooleanQuery together and pass the complex result to
IndexSearcher. For programmatic query construction the Query
subclasses are the way to go.
Uwe
Am 11.08.2024 um 10:37 schrieb Greg Huber:
Looking through my httpd logs I see lots of searches as such
/devbox/search?q=%29%20AND%203318%3D4385%20AND%20%287778%3D7778
ie : ) AND 3318=4385 AND (7778=7778
guess they might be fishing for something.
For the fuzzy search I use a different distance values and the
default is ~0.6
String distance = "~0.6";
.........
bQueryF.add(queryParser.parse(QueryParser.escape(term)
.replace("~", "") + distance),
BooleanClause.Occur.MUST);
From the query string above I get the error shown below. I added
QueryParser.escape(term) but maybe this does nothing here. Is there
a way to escape these or configure lecene just to return no results
rather than an exception.
2024-08-11 09:19:05,427 ERROR me.search.operations.SearchMe
SearchMe:doRun - Error searching index
org.apache.lucene.queryparser.classic.ParseException: Cannot parse
'AND~0.6': Encountered " <AND> "AND "" at line 1, column 0.
Was expecting one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
at
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:141)
~[lucene-queryparser-9.11.1.jar:9.11.1
0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
at
org.events.business.search.operations.SearchOperation.doRun(SearchOperation.java:202)
[classes/:?]
at
org.events.business.search.operations.ReadFromIndexOperation.run(ReadFromIndexOperation.java:29)
[classes/:?]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]
at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[?:?]
Thanks
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org