Github user afs commented on the pull request:
https://github.com/apache/jena/pull/95#issuecomment-212116452
An alternative (mentioned before) is to cache the query results before
serialization and replay and iterator when there is a cache hit. This could be
done with an iterator wrapper that copies what it sees. This could also help
with the fact that there is no control on the results - e.g. if the results are
very long, it all gets cached.
There are advantages and disadvantages:
1. +ve : If the same query with made with a different "Accept" header,
there is no cache hit and the query results are cached twice.
1. +ve : The copying iterator can have some policy controls like limiting
caching results to N,000 rows. For robustness reasons, we probably want some
limits here so as not to cache the dataset by accident.
1. -ve : the results are serialized each time. Talking to Jetty at all is
going to be slower than Vanish and this adds to that.
1. +ve : A bonus is that LIMIT/OFFSET can be done (a later feature) if the
full results are executed. If the query is LIMIT/OFFSET+ORDER (to get stablity
- a common idiom), the exact query isn't being repeated but it is an expensive
query. Note this is alreay optimized by a TopN query so the interactions here
are complicated but if a repeat results iterator is available, different
LIMIT/OFFSET can be done.
Questions and specific points on this PR:
1. There is an nasty corner case - for long queries, the client can go away
during the response being sent back. If I read the code right, the cache entry
has already been created. has this n=been tested (it is hard to test for badly
behaved clients).
1.How can an uninitialized cache entry get into the cache?
1. SPARQL_Query, SPARQL_Query_Cache. What's the relationship here? There
seems to be some duplication, log messages come out twice on first query and
action.endRead() is called twice on first query.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---