[jira] [Commented] (PHOENIX-539) Implement parallel scanner that does not spool to disk

James Taylor (JIRA) Tue, 08 Jul 2014 01:40:17 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054651#comment-14054651
 ]


James Taylor commented on PHOENIX-539:
--------------------------------------

The lease timeout is a different issue, I believe. It's cause primarily if 
you're doing a group by or order by on too big a chunk of data. The client in 
that case doesn't hear back from the server for a long time b/c it's busy 
trying to sort/group. I believe the best solution for that is to improve the 
parallelization such that smaller chunks are operated on so that the client 
always hears back before the timeout occurs.

There's also a Phoenix config for overall query execution time, but that can be 
set to a large time interval without any issues. Setting the lease time to a 
very large time interval has the negative side effect that you don't know when 
your region server goes d?own for potentially a long time.

That makes sense for the ORDER BY not using your optimization. The same would 
be the case for GROUP BY, I believe. The row key you'd get back from the scan 
wouldn't match the row key from the original data since it'd be the row key 
based on the group by expressions. Have  you seen issues with this? Probably 
best if you disable the optimization for GROUP BY as well.

For joins, in theory it could work, though. I suspect that the hash cache is 
getting cleared when the scan for the first chunk is closed and then subsequent 
chunks wouldn't find it. Would you mind filing a JIRA for this?

+1 on the patch with these changes

> Implement parallel scanner that does not spool to disk
> ------------------------------------------------------
>
>                 Key: PHOENIX-539
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-539
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: larsh
>         Attachments: PHOENIX-539.1.patch, PHOENIX-539.patch
>
>
> In scenarios where a LIMIT is not present on a non aggregate query that will 
> return a lot of results, Phoenix spools the results to disk. This is less 
> than ideal in these situations. @larsh has created a very good and relatively 
> simple implementation that is queue based to replace this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-539) Implement parallel scanner that does not spool to disk

Reply via email to