[GitHub] jena pull request: JENA-626 SPARQL Query Caching

afs Tue, 19 Apr 2016 13:35:32 -0700

Github user afs commented on the pull request:

    https://github.com/apache/jena/pull/95#issuecomment-212116452
  
    An alternative (mentioned before) is to cache the query results before 
serialization and replay and iterator when there is a cache hit.  This could be 
done with an iterator wrapper that copies what it sees. This could also help 
with the fact that there is no control on the results - e.g. if the results are 
very long, it all gets cached.
    
    
    There are advantages and disadvantages:
    
    1. +ve : If the same query with made with a different "Accept" header, 
there is no cache hit and the query results are cached twice.
    
    1. +ve : The copying iterator can have some policy controls like limiting 
caching results to N,000 rows. For robustness reasons, we probably want some 
limits here so as not to cache the dataset by accident.
    
    1. -ve : the results are serialized each time.  Talking to Jetty at all is 
going to be slower than Vanish and this adds to that. 
    
    1. +ve : A bonus is that LIMIT/OFFSET can be done (a later feature) if the 
full results are executed.  If the query is LIMIT/OFFSET+ORDER (to get stablity 
- a common idiom), the exact query isn't being repeated but it is an expensive 
query.  Note this is alreay optimized by a TopN query so the interactions here 
are complicated but if a repeat results iterator is available, different 
LIMIT/OFFSET can be done.
    
    Questions and specific points on this PR:
    
    1. There is an nasty corner case - for long queries, the client can go away 
during the response being sent back.  If I read the code right, the cache entry 
has already been created. has this n=been tested (it is hard to test for badly 
behaved clients).
    1.How can an uninitialized cache entry get into the cache?
    1. SPARQL_Query, SPARQL_Query_Cache. What's the relationship here? There 
seems to be some duplication, log messages come out twice on first query and 
action.endRead() is called twice on first query.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: JENA-626 SPARQL Query Caching

Reply via email to