[ https://issues.apache.org/jira/browse/PHOENIX-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054650#comment-14054650 ]
Lars Hofhansl commented on PHOENIX-539: --------------------------------------- Lemme look at the patch in a bit. The approach I had in mind is based on the observation that an open scanner on an HBase region server does not consume any resources (just an identifier). So instead of spooling on the client we can pace the scan on the server. Care must be taken that we transfer enough data to the client so that the client can proceed (for example when it needs to execute a client side sort). Specifically: # For N parallel scanners with scanner caching X (i.e. buffer). We need N*X worth of buffers on the client. # We would never buffer more than X for a single scanner on the client (so that we can guarantee that we can proceed even for a client sort) # The N scanners can then be handled by any number of threads <N, doing round-robin through the scanners. > Implement parallel scanner that does not spool to disk > ------------------------------------------------------ > > Key: PHOENIX-539 > URL: https://issues.apache.org/jira/browse/PHOENIX-539 > Project: Phoenix > Issue Type: Task > Reporter: James Taylor > Assignee: larsh > Attachments: PHOENIX-539.1.patch, PHOENIX-539.patch > > > In scenarios where a LIMIT is not present on a non aggregate query that will > return a lot of results, Phoenix spools the results to disk. This is less > than ideal in these situations. @larsh has created a very good and relatively > simple implementation that is queue based to replace this. -- This message was sent by Atlassian JIRA (v6.2#6252)