[jira] [Comment Edited] (JENA-228) Limiting query output centrally

Rob Vesse (JIRA) Wed, 02 Jan 2013 06:22:16 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542073#comment-13542073
 ]


Rob Vesse edited comment on JENA-228 at 1/2/13 2:22 PM:
--------------------------------------------------------

Regardless of implementation this feature essentially boils down to a form of 
security feature, you are looking to secure your SPARQL engine (however it may 
be exposed) against intentional or inadvertent denial of service attacks.  
These might be queries where calculating large numbers of results is costly or 
queries which are simple but produce vast amounts of data - think SELECT * { ?s 
?p ?o } on very large databases.

FWIW this was something we addressed this at YarcData in our product by doing 
interception in two ways:

1 - At the SPARQL endpoint layer

We have a defined (and configurable) limit on results and when a query comes in 
we inspect the existing LIMIT (if any) and add our own LIMIT as appropriate.  
Where a pre-existing limit exists we apply the lesser of the existing and 
system limit e.g. if the system limit is 100 and the query has LIMIT 1 we leave 
it as is.

2 - At the Query Iterator layer

Since we farm out the entire query to an external query engine and then 
translate the internal format of results back into Jena classes via a custom 
QueryIterator implementation we also apply a limit at this point.  Applying at 
this level is more useful when you want to make the query engine do all the 
work but don't need all the results sent direct to the client, this is useful 
for us because we transmit large results over architectural boundaries using 
files on disk so can provide previews of results to clients with the full 
results available on disk for later consumption,

The latter approach is probably less applicable to ARQ because a query is 
answered by some combination of iterators not a single iterator as in our case. 
 Personally I prefer the former approach because it is nice and early in the 
pipeline.  However a general solution probably needs to be somewhere more in 
the middle to account for both queries coming in via SPARQL endpoints and via 
the API.
                
      was (Author: rvesse):
    Regardless of implementation this feature essentially boils down to a form 
of security feature, you are looking to secure your SPARQL engine (however it 
may be exposed) against intentional or inadvertent denial of service attacks.  
These might be queries where calculating large numbers of results is costly or 
queries which are simple but produce vast amounts of data - think SELECT * { ?s 
?p ?o } on very large numbers.

FWIW this was something we addressed this at YarcData in our product by doing 
interception in two ways:

1 - At the SPARQL endpoint layer

We have a defined (and configurable) limit on results and when a query comes in 
we inspect the existing LIMIT (if any) and add our own LIMIT as appropriate.  
Where a pre-existing limit exists we apply the lesser of the existing and 
system limit e.g. if the system limit is 100 and the query has LIMIT 1 we leave 
it as is.

2 - At the Query Iterator layer

Since we farm out the entire query to an external query engine and then 
translate the internal format of results back into Jena classes via a custom 
QueryIterator implementation we also apply a limit at this point.  Applying at 
this level is more useful when you want to make the query engine do all the 
work but don't need all the results sent direct to the client, this is useful 
for us because we transmit large results over architectural boundaries using 
files on disk so can provide previews of results to clients with the full 
results available on disk for later consumption,

The latter approach is probably less applicable to ARQ because a query is 
answered by some combination of iterators not a single iterator as in our case. 
 Personally I prefer the former approach because it is nice and early in the 
pipeline.  However a general solution probably needs to be somewhere more in 
the middle to account for both queries coming in via SPARQL endpoints and via 
the API.
                  
> Limiting query output centrally
> -------------------------------
>
>                 Key: JENA-228
>                 URL: https://issues.apache.org/jira/browse/JENA-228
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: ARQ, Fuseki
>    Affects Versions: ARQ 2.9.0, Fuseki 0.2.1
>            Reporter: Giuseppe Sollazzo
>
> I was wondering whether there will be some way of limiting output in fuseki. 
> Basically, I'd like to be able to enforce limits on the number of results 
> returned by the system.
> As an example, think about a "numrows" in sql.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (JENA-228) Limiting query output centrally

Reply via email to