[ https://issues.apache.org/jira/browse/PHOENIX-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054651#comment-14054651 ]
James Taylor commented on PHOENIX-539: -------------------------------------- The lease timeout is a different issue, I believe. It's cause primarily if you're doing a group by or order by on too big a chunk of data. The client in that case doesn't hear back from the server for a long time b/c it's busy trying to sort/group. I believe the best solution for that is to improve the parallelization such that smaller chunks are operated on so that the client always hears back before the timeout occurs. There's also a Phoenix config for overall query execution time, but that can be set to a large time interval without any issues. Setting the lease time to a very large time interval has the negative side effect that you don't know when your region server goes d?own for potentially a long time. That makes sense for the ORDER BY not using your optimization. The same would be the case for GROUP BY, I believe. The row key you'd get back from the scan wouldn't match the row key from the original data since it'd be the row key based on the group by expressions. Have you seen issues with this? Probably best if you disable the optimization for GROUP BY as well. For joins, in theory it could work, though. I suspect that the hash cache is getting cleared when the scan for the first chunk is closed and then subsequent chunks wouldn't find it. Would you mind filing a JIRA for this? +1 on the patch with these changes > Implement parallel scanner that does not spool to disk > ------------------------------------------------------ > > Key: PHOENIX-539 > URL: https://issues.apache.org/jira/browse/PHOENIX-539 > Project: Phoenix > Issue Type: Task > Reporter: James Taylor > Assignee: larsh > Attachments: PHOENIX-539.1.patch, PHOENIX-539.patch > > > In scenarios where a LIMIT is not present on a non aggregate query that will > return a lot of results, Phoenix spools the results to disk. This is less > than ideal in these situations. @larsh has created a very good and relatively > simple implementation that is queue based to replace this. -- This message was sent by Atlassian JIRA (v6.2#6252)