[jira] [Commented] (PHOENIX-539) Implement parallel scanner that does not spool to disk

James Taylor (JIRA) Mon, 07 Jul 2014 10:47:26 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053907#comment-14053907
 ]


James Taylor commented on PHOENIX-539:
--------------------------------------

[~gabriel.reid] - thanks so much for the patch. Sorry for the delay, but I 
missed the notification for this. [~maryannxue] would be the best one to 
suggest how to improve that check, but not sure if she's back online yet or not.

[~lhofhansl] - would you mind reviewing this or at a minimum explaining the 
approach you taken/were planning to take? As I understand your approach, you're 
leaving the Scanner open for each parallel scan (as they don't consume 
resources on the server-side). The results are put in a blocking queue on the 
client with a depth of the number of parallel scans. The next calls on the 
client then pace the next calls done through the Scanners.

I'm +1 on the patch, as this is definitely an improvement, but it'd be ideal if 
some simplifcation on the overall parallelization/spooling could be done.

Couple of questions/comments: It seems that you disable your optimization when 
an order by is present or when a join is being done. Can you document why this 
is the case? Also, can you add a HashJoinInfo.isHashJoin(Scan) call that just 
checks for the existence of one of the join attributes on the scan instead of 
deserializing the info? 

> Implement parallel scanner that does not spool to disk
> ------------------------------------------------------
>
>                 Key: PHOENIX-539
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-539
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: larsh
>         Attachments: PHOENIX-539.patch
>
>
> In scenarios where a LIMIT is not present on a non aggregate query that will 
> return a lot of results, Phoenix spools the results to disk. This is less 
> than ideal in these situations. @larsh has created a very good and relatively 
> simple implementation that is queue based to replace this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-539) Implement parallel scanner that does not spool to disk

Reply via email to