[
https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542073#comment-13542073
]
Rob Vesse edited comment on JENA-228 at 1/2/13 2:22 PM:
--------------------------------------------------------
Regardless of implementation this feature essentially boils down to a form of
security feature, you are looking to secure your SPARQL engine (however it may
be exposed) against intentional or inadvertent denial of service attacks.
These might be queries where calculating large numbers of results is costly or
queries which are simple but produce vast amounts of data - think SELECT * { ?s
?p ?o } on very large databases.
FWIW this was something we addressed this at YarcData in our product by doing
interception in two ways:
1 - At the SPARQL endpoint layer
We have a defined (and configurable) limit on results and when a query comes in
we inspect the existing LIMIT (if any) and add our own LIMIT as appropriate.
Where a pre-existing limit exists we apply the lesser of the existing and
system limit e.g. if the system limit is 100 and the query has LIMIT 1 we leave
it as is.
2 - At the Query Iterator layer
Since we farm out the entire query to an external query engine and then
translate the internal format of results back into Jena classes via a custom
QueryIterator implementation we also apply a limit at this point. Applying at
this level is more useful when you want to make the query engine do all the
work but don't need all the results sent direct to the client, this is useful
for us because we transmit large results over architectural boundaries using
files on disk so can provide previews of results to clients with the full
results available on disk for later consumption,
The latter approach is probably less applicable to ARQ because a query is
answered by some combination of iterators not a single iterator as in our case.
Personally I prefer the former approach because it is nice and early in the
pipeline. However a general solution probably needs to be somewhere more in
the middle to account for both queries coming in via SPARQL endpoints and via
the API.
was (Author: rvesse):
Regardless of implementation this feature essentially boils down to a form
of security feature, you are looking to secure your SPARQL engine (however it
may be exposed) against intentional or inadvertent denial of service attacks.
These might be queries where calculating large numbers of results is costly or
queries which are simple but produce vast amounts of data - think SELECT * { ?s
?p ?o } on very large numbers.
FWIW this was something we addressed this at YarcData in our product by doing
interception in two ways:
1 - At the SPARQL endpoint layer
We have a defined (and configurable) limit on results and when a query comes in
we inspect the existing LIMIT (if any) and add our own LIMIT as appropriate.
Where a pre-existing limit exists we apply the lesser of the existing and
system limit e.g. if the system limit is 100 and the query has LIMIT 1 we leave
it as is.
2 - At the Query Iterator layer
Since we farm out the entire query to an external query engine and then
translate the internal format of results back into Jena classes via a custom
QueryIterator implementation we also apply a limit at this point. Applying at
this level is more useful when you want to make the query engine do all the
work but don't need all the results sent direct to the client, this is useful
for us because we transmit large results over architectural boundaries using
files on disk so can provide previews of results to clients with the full
results available on disk for later consumption,
The latter approach is probably less applicable to ARQ because a query is
answered by some combination of iterators not a single iterator as in our case.
Personally I prefer the former approach because it is nice and early in the
pipeline. However a general solution probably needs to be somewhere more in
the middle to account for both queries coming in via SPARQL endpoints and via
the API.
> Limiting query output centrally
> -------------------------------
>
> Key: JENA-228
> URL: https://issues.apache.org/jira/browse/JENA-228
> Project: Apache Jena
> Issue Type: New Feature
> Components: ARQ, Fuseki
> Affects Versions: ARQ 2.9.0, Fuseki 0.2.1
> Reporter: Giuseppe Sollazzo
>
> I was wondering whether there will be some way of limiting output in fuseki.
> Basically, I'd like to be able to enforce limits on the number of results
> returned by the system.
> As an example, think about a "numrows" in sql.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira