I agree that for the case where the pattern is
constant (passes isConstant ie is a string literal),
your case 1/) it's best treated as a syntax error --
which is the current behaviour, so no change needed
there.

In 2/, converting to a syntax error means (if I read
the code correctly) that the first bad pattern will
terminate the entire query. Is that compatible with
the SPARQL spec? I looked at the relevant sections
and left unsure.

If the query is terminated immediately then there's
no need to use Log.warnOnce?

Chris







On 29 March 2017 at 22:49, Andy Seaborne <[email protected]> wrote:

> There are two cases: I'd prefer to leave the static case as a syntax error
> and improve the dynamic case.
>
> 1/ If its a fixed pattern, it is compiled once.
> An error here should become a "syntax error" to
> E_Regex.init via makeRegexEngine.
>
> This is a common case.
>
> 2/ If a dynamic pattern, a ExprEvalException (it has no stacktrace because
> fillInStackTrace is overridden for efficiency) would be better.
>
> So throwing a ExprEvalException, catching in E_Regex.init and then
> throwing ExprException to become a syntax error as currently.
>
> In addition, in (2), Log.warnOnce would be good.
>
>     Andy
>
>
> On 24/03/17 11:55, chris dollin wrote:
>
>> Dear All
>>
>> When tracking down the cause of a disc overflow
>> on our Fuseki servers [0], we found that the log
>> files were full of stack dumps caused by queries
>> containing non-constant [1] illegal regular expression
>> patterns.
>>
>> In a query like
>>
>>     SELECT * WHERE {?s ?p ?o FILTER(REGEX(?o, ?o + ")DEF", "")}
>>
>> where the pattern part of the regex is not constant,
>> the regex is compiled (in RegexJava) when the filter is
>> evaluated against the solution bindings. If the pattern
>> is syntactically illegal, a PatternSyntaxException is
>> thrown, caught by RegexJava in makePattern, which then
>> thows an ExprException.
>>
>> The ExprException is caught in QueryIterFilterExpr.accept()
>> and a log entry generated, including the (large) stack
>> trace, and the filter is treated as FALSE so no solution
>> is generated.
>>
>> When there are many -- say millions -- of results to filter
>> this generates millions of unhelpful stack traces,
>> [unhelpful because it confuses the evaluation of the
>> SPARQL query with the execution of the Java ARQ code]
>> and the disc fills up with logfile.
>>
>> This is of course an edge case but we'd like to
>> avoid being cut so badly by the edge. Any other
>> filter expression that can throw EvalException will
>> have similar behaviour.
>>
>> At the very least we'd like to be able to reduce the
>> size of each log message by omitting the stack trace [2].
>> But when there are millions of results to filter,
>> there will still be millions of lines going to the
>> logs.
>>
>> An alternative would be to limit the number of warning log
>> entries generated for any given filter.
>>
>> If the pattern part of a regex filter is treated as part
>> of the query rather than part of the data, even in the
>> case above that we started with, then a syntax error in
>> the pattern could be regarded as an error in the query,
>> and the entire query execution abandoned. Making this
>> spec-compatible might be tricky, though.
>>
>> Thoughts?
>>
>> Chris
>>
>> [0] Fuseki 2.4.1
>>
>> [1] When the pattern is a constant, which it will
>>     usually be, it is compiled once before the query
>>     runs and generates a single different error.
>>
>> [2] One way to do this would be to have RegexJava.makePattern
>>     throw an ExprEvalException rather than an ExprException;
>>     we note empirically that an ExprEvalException generates
>>     a log entry that doesn't contain a stack trace. Or
>>     we could introduce additional specific exception types.
>>
>>
>>


-- 
"What I don't understand is this ..."   Trevor Chaplin, /The Beiderbeck
Affair/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

Reply via email to