gerlowskija commented on code in PR #2499:
URL: https://github.com/apache/solr/pull/2499#discussion_r1664171169
##########
solr/core/src/java/org/apache/solr/core/SolrConfig.java:
##########
@@ -103,6 +103,8 @@ public class SolrConfig implements MapSerializable {
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
public static final String DEFAULT_CONF_FILE = "solrconfig.xml";
+ public static final String MIN_PREFIX_LENGTH = "minPrefixLength";
Review Comment:
Naming-wise, I was very much following the model of `maxBooleanClauses`.
But in hindsight, I think you're right that this is too ambiguous.
For now I'll change this to `minPrefixQueryTermLength`, but if there's
another option there you like better, lmk.
##########
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java:
##########
@@ -134,6 +135,19 @@ protected Query
newWildcardQuery(org.apache.lucene.index.Term t) {
}
}
+ @Override
+ protected Query getPrefixQuery(String field, String termStr)
throws ParseException {
+ // TODO check the field type and call QueryUtils.ensureBlah here
Review Comment:
I noticed this in testing initially and was surprised as well.
It looks like ComplexPhraseQParserPlugin assumes "text" and doesn't rely on
FieldType (or its subclasses) to do query-construction in a schema-aware
manner. I'd be curious to see what other QParsers act similarly, but I'm not
familiar enough with our query-parsing code generally to say whether it makes
sense or might be problematic. But definitely orthogonal to this PR either way.
##########
solr/core/src/java/org/apache/solr/schema/StrField.java:
##########
@@ -68,6 +71,20 @@ public List<IndexableField> createFields(SchemaField field,
Object value) {
return Collections.singletonList(fval);
}
+ @Override
+ public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) {
+ final var query = super.getPrefixQuery(parser, sf, termStr);
+
+ // Some internal usage (e.g. faceting) creates PrefixQueries without a
surrounding QParser, so
+ // check for null here before using QParser to access the limit value
+ if (query instanceof PrefixQuery && parser != null) {
+ final var minPrefixLength =
+ parser.getReq().getCore().getSolrConfig().prefixQueryMinPrefixLength;
+ QueryUtils.ensurePrefixQueryObeysMinimumPrefixLength(query, termStr,
minPrefixLength);
Review Comment:
Done!
##########
solr/core/src/test-files/solr/collection1/conf/solrconfig.xml:
##########
@@ -89,6 +89,18 @@
-->
<maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
+ <!-- Minimum acceptable prefix-size for prefix-based queries on string
fields.
+
+ Prefix-based queries consume memory in proportion to the number of
terms in the index
+ that start with that prefix. Short prefixes tend to match many many
more indexed-terms
+ and consume more memory as a result, sometimes causing stability
issues on the node.
+
+ This setting allows administrators to require that prefixes meet or
exceed a specified
+ minimum length requirement. Prefix queries that don't meet this
requirement return an
+ error to users.
+ -->
+ <minPrefixLength>${solr.min.prefixLength:2}</minPrefixLength>
Review Comment:
To avoid fragmentation, let's continue XML-tag naming discussion in this
thread [here](https://github.com/apache/solr/pull/2499/files#r1629916574).
For the governing sys prop, I've changed that to
'solr.query.minPrefixLength' as suggested.
##########
solr/core/src/java/org/apache/solr/schema/TextField.java:
##########
@@ -165,6 +167,20 @@ public Query getFieldTermQuery(QParser parser, SchemaField
field, String externa
return new TermQuery(new Term(field.getName(), br));
}
+ @Override
+ public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) {
+ final var query = super.getPrefixQuery(parser, sf, termStr);
Review Comment:
> Why override these methods in TextField & StrField with duplicative code
when you could modify FieldType.getPrefixQuery instead?
Moving the logic into `FieldType.getPrefixQuery` would have it apply to all
field types, which I'm not sure we want. The limit will probably always be
unhelpful for "boolean" and "enum" type fields for instance, since they're
generally so low cardinality as to be immune to the "runaway prefix query"
problem.
For now I've acquiesced and moved the logic to `FieldType`, but curious to
hear your thoughts on some of those exceptional cases.
##########
solr/core/src/java/org/apache/solr/search/QueryUtils.java:
##########
@@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) {
}
}
+ public static void ensurePrefixQueryObeysMinimumPrefixLength(
+ Query query, String prefix, int minPrefixLength) {
+ // TODO Should we provide a query-param to disable the limit on a
request-by-request basis? I
+ // can imagine scenarios where advanced users may want to enforce the
limit on most fields,
+ // but ignore it for a few fields that they know to be low-cardinality and
therefore "less
Review Comment:
Done!
##########
solr/core/src/java/org/apache/solr/search/QueryUtils.java:
##########
@@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) {
}
}
+ public static void ensurePrefixQueryObeysMinimumPrefixLength(
+ Query query, String prefix, int minPrefixLength) {
+ // TODO Should we provide a query-param to disable the limit on a
request-by-request basis? I
Review Comment:
Done!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]