[jira] [Commented] (SOLR-12902) Block Expensive Queries custom Solr component

Hoss Man (JIRA) Mon, 14 Jan 2019 11:10:22 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-12902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742424#comment-16742424
 ]


Hoss Man commented on SOLR-12902:
---------------------------------

Quick comment on something specific...
{quote}... I have added a test-case in the code to explain the scenario in 
which the custom component will be helpful.
{quote}
Tirth: what you're describing is really more of an "example documentation" ... 
when we folks talk about having test cases for new functionality/patches, what 
they mean is new JUnit powered tests that are either unit tests proving that 
the underlying methods behave as documented, or integration level tests showing 
that when a Solr request comes in, the search component behaves as expected (in 
this case: letting the request execute and return the expected results, or 
returning an expected error if it violates the configuration)
----
General feedback:

This is a type of functionality we've talked about for a long time, but one of 
the reasons we (or at least "I") have never tackled it head on relates to my 
main concern with the approach currently taking the in the PR patch: it sets us 
down the path of needing a "laundry list" (which we have to maintain and 
constantly update moving forward) of every possible param/feature (and 
combination there of) that _some_ people *might* find problematic (with 
configuration options for all of them) in order to help ensure that something 
like this is useful for _most_ people.

The reason i say that is because typically when users come along to assess a 
feature like this, and they are concerned about "A, B, X & C" it's not useful 
to them if it only solves "A, B C, & D" – w/o support for X. Because if they 
need their own custom solution/plugin for preventing X they might as well 
encorporate a custom solution for "A, B, & C" as well, so they only need to 
worry about configurating one solution instead of two.

The permutations of things to worry about providing configuration options for 
is problematic as well, because it's not just a question of "here's *every* 
solr param, let's add a config option to turn it off or limit it's range of 
legal values" (if it were we could maybe simplify the impl w/a "rules" syntax 
that didn't need to know about specific param names) but it's also all about 
the permutations of interconnected params – ex: folks who want to support both 
faceting & highlighting, but not on the same requests; or highlighting is ok, 
as long as rows isn't too big.
----
I think the only way to offer a really re-usable generalized solution for 
something like this would be via the ScriptEngine, and letting people configure 
their own set of arbitrary script(s) that could be compiled on startup, and 
then evaled against the request params (and request context). We could test & 
provide some small re-usable sample/example scripts that people could choose to 
mix and match or customize ... similar to how spam assissian rules are 
provided/configured.

I think the simplest implementation on the javaside would be:
 * configure a list of script files
 * compile all scripts on init
 * at request time loop over each script in order and eval
 * if script eval result is something that is null or .equals() FALSE or "new 
Float(0)" continue
 * if script eval result is anything else, return the toString as an error 
message

that way people could write scripts like...
{code:java}
if (params[rows] > 100) {
  return "rows param is too high"
}
if (params[start] > 10) {
  return "start param is too high"
}
if (null != params[facet.pivot] && null != params[highlight]) {
  ...
...
return 0
{code}
But we could potentially also support a varient option for simpler scripts 
w/less control over the error message returned...
 * configure a NamedList mapping error strings to lists of script files, ie...
{code:java}
<lst name="scripts">
  <str name="limited pagination support">script1.js</str>
  <arr name="unsupported param combinations">
     <str>script2.js</str>
     <str>script3.js</str>
     <str>script4.js</str>
  ...
{code}

 * compile all scripts on init, maintain a mapping to their error string
 * at request time loop over each script in order and eval
 * if script eval result is not .equals() TRUE then retrun the associated error 
string

that way people could have much simpler "boolean expression" scripts like
{code:java}
   (rows <= 100)
&& (start <= 10)
&& (params[facet.pivot] ^ params[highlight])
&& ...
{code}
 

A lot of the "plumbing" code we'd need for something like this already exists 
in the StatelessScriptUpdateProcessorFactory – we'd just need to refactor it 
into a "ScriptUtils" helper place, and tease out some bits that require the 
"Invocable" API since we wouldn't really need that here.

Thoughts?

> Block Expensive Queries custom Solr component
> ---------------------------------------------
>
>                 Key: SOLR-12902
>                 URL: https://issues.apache.org/jira/browse/SOLR-12902
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tirth Rajen Mehta
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Added a Block Expensive Queries custom Solr component ( 
> [https://github.com/apache/lucene-solr/pull/47|https://github.com/apache/lucene-solr/pull/477)]
>  ) :
>  * This search component can be plugged into your SearchHandler if you would 
> like to block some well known expensive queries.
>  * The queries that are blocked and failed by component currently are deep 
> pagination queries as they are known to consume lot of memory and CPU. These 
> are 
>  * 
>  ** queries with a start offset which is greater than the configured 
> maxStartOffset config parameter value
>  ** queries with a row param value which is greater than the configured 
> maxRowsFetch config parameter value



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12902) Block Expensive Queries custom Solr component

Reply via email to