[ 
https://issues.apache.org/jira/browse/PHOENIX-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151990#comment-15151990
 ] 

James Taylor commented on PHOENIX-2606:
---------------------------------------

We currently do a sort per scan on the client-side for group by and then do the 
final merge/aggregation. This sort uses memory mapped files, so in a way it's a 
kind of spooling. We could potentially do the sort on the server-side instead 
and then only pull over the data as the final merge is done. Here are some 
relevant JIRAs for improvements: PHOENIX-1217, PHOENIX-1006.

> Cursor support in Phoenix
> -------------------------
>
>                 Key: PHOENIX-2606
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2606
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Sudarshan Kadambi
>
> Phoenix should look to support a cursor model where the user could set the 
> fetch size to limit the number of rows that are fetched in each batch. Each 
> batch of result rows would be accompanied by a flag indicating if there are 
> more rows to be fetched for a given query or not. 
> The state management for the cursor could be done in the client side or 
> server side (i.e. HBase, not the Query Server). The client side state 
> management could involve capturing the last key in the batch and using that 
> as the start key for the subsequent scan operation. The downside of this 
> model is that if there were any intervening inserts or deletes in the result 
> set of the query, backtracking on the cursor would reflect these additional 
> rows (consider a page down, followed by a page up showing a different set of 
> result rows). Similarly, if the cursor is defined over the results of a join 
> or an aggregation, these operations would need to be performed again when the 
> next batch of result rows are to be fetched. 
> So an alternate approach could be to manage the state server side, wherein 
> there is a query context area in the Regionservers (or, maybe just a 
> temporary table) and the cursor results are fetched from there. This ensures 
> that the cursor has snapshot isolation semantics. I think both models make 
> sense but it might make sense to start with the state management completely 
> on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to