[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174283#comment-15174283
 ] 

John Omernik commented on DRILL-4460:
-------------------------------------

Would external hashing essentially allow the hash to be spilled to disk?  I 
guess I want to be clear when it comes to "switching": if there is a way to 
switch "during the query" to save time and work that's already done with one 
method, that would be great. But as I am thinking more crudely here. 
Essentially catching the out of memory error, alter session hashagg = false, re 
run query, turn hash agg back on. It would be slow to be sure, but from a user 
perspective, a query that works, but takes time is better than one that fails. 
(see previous work: The Apache Hive project).   

To better summarize: as an admin, and a bunch of new users to drill, if they 
try to run a query, in the default, and it fails, it takes them seeking out how 
to fix it, or reaching out to me or my team. If instead I had an option that 
allowed it to succeed, and at the same time showed them how to do it 
differently, the user experience is better (they learned what was happening) 
and my experience is better (they don't interrupt my vacation with questions) 
:) 

I am not familiar with external hashing, what options controls that? 

> Provide feature that allows fall back to sort aggregation
> ---------------------------------------------------------
>
>                 Key: DRILL-4460
>                 URL: https://issues.apache.org/jira/browse/DRILL-4460
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.5.0
>            Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to