[ https://issues.apache.org/jira/browse/PHOENIX-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151990#comment-15151990 ]
James Taylor commented on PHOENIX-2606: --------------------------------------- We currently do a sort per scan on the client-side for group by and then do the final merge/aggregation. This sort uses memory mapped files, so in a way it's a kind of spooling. We could potentially do the sort on the server-side instead and then only pull over the data as the final merge is done. Here are some relevant JIRAs for improvements: PHOENIX-1217, PHOENIX-1006. > Cursor support in Phoenix > ------------------------- > > Key: PHOENIX-2606 > URL: https://issues.apache.org/jira/browse/PHOENIX-2606 > Project: Phoenix > Issue Type: New Feature > Reporter: Sudarshan Kadambi > > Phoenix should look to support a cursor model where the user could set the > fetch size to limit the number of rows that are fetched in each batch. Each > batch of result rows would be accompanied by a flag indicating if there are > more rows to be fetched for a given query or not. > The state management for the cursor could be done in the client side or > server side (i.e. HBase, not the Query Server). The client side state > management could involve capturing the last key in the batch and using that > as the start key for the subsequent scan operation. The downside of this > model is that if there were any intervening inserts or deletes in the result > set of the query, backtracking on the cursor would reflect these additional > rows (consider a page down, followed by a page up showing a different set of > result rows). Similarly, if the cursor is defined over the results of a join > or an aggregation, these operations would need to be performed again when the > next batch of result rows are to be fetched. > So an alternate approach could be to manage the state server side, wherein > there is a query context area in the Regionservers (or, maybe just a > temporary table) and the cursor results are fetched from there. This ensures > that the cursor has snapshot isolation semantics. I think both models make > sense but it might make sense to start with the state management completely > on the client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)