[GitHub] phoenix issue #192: Phoenix Cursors implementation

ankitsinghal Tue, 06 Sep 2016 04:05:51 -0700

Github user ankitsinghal commented on the issue:

    https://github.com/apache/phoenix/pull/192
  
    @anirudha , thanks for the pull request. Is it possible for you to change 
the JIRA subject to include PHOENIX-2606 or link somehow, so that we can see 
the comment logged on JIRA as well.
    
    
    With current implementation,  every query is modified to include primary 
key columns and tries to use RVC, which may produce improper results in 
following cases and also sometimes there is no advantage in using it as they 
donât limit scan.
    
    - duplicate values for a column
    - order by on non-primary key axis.
    - Primary key columns having null value.
    - Aggregate queries
    
    
    @samarthjain can you please confirm as per your observation for what all 
queries RVC cannot be used?
    
    @anirudha 
    To abstract the query complexities and for initial support of Cursors, we 
should not modify any query but instead we can keep the ResultSet object open 
for corresponding cursor(with timeout) and start caching rows as we proceed 
further with next() calls(FETCH NEXT FROM cursor) , the cache will be used for 
previous() calls(FETCH PRIOR FROM cusror) on resultSet. (cache will also be 
used for next() calls if we go previous() in the cache).â
    
    Pros/Cons of above approach:-
    
    Pros:-
    
    - Highly abstracted, We donât need to understand each and every query and 
develop logic separately for them.
    - As per the current implementation, We donât need to expend ORDER BY(on 
non-primary key axis) all the time to include primary key column for 
uniqueness. As this will cause problem at the server because we will have more 
keys(almost all keys) to sort every time. (RVC will not restrict in this case).
    - Cache size can also be limited by the user and if we exhaust the cache , 
cache can be updated by using re-runing the query with LIMIT+OFFSET only 
    - An optimization can be done for flat queries(including INDEX queries) 
using last and peeked SCAN keys(instead of RVC to handle null and duplicate 
properly) for updating the cache instead of LIMIT+OFFSET.
    - Snapshot/Static queries can be provided by storing the compile time of 
OPEN CURSOR (and can be used to limit the scan upper bound for timestamp with 
it).
    
    Cons:-
    
    - As previous() is not supported on the server(Hbase), so cache overhead is 
there to maintain the results to support previous() at client.
    - Re-calculation of results after we reach cache limit or scanner timeout.
    
    
    @JamesRTaylor , WDYT? any suggestions



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix issue #192: Phoenix Cursors implementation

Reply via email to