[jira] [Updated] (IMPALA-10445) The ability to adjust NDV's precision with query option

Fifteen (Jira) Sun, 11 Apr 2021 19:17:05 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-10445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fifteen updated IMPALA-10445:
-----------------------------
    Description: 
Since IMPALA-2658, we can trade memory for more accurate NDV estimation. It is 
fascinating because tests showing error rate within 0.1% while no tremendous 
resource usage rise is found( #registers is 2 << 18). Users may have less 
complaint on computation precision in the future.

However, the road to apply high precision NDV to production environment is 
uneven. 

1) We have to re-write sqls for a large number of historical workloads. Which 
is time costing and is prone to error.

2) Cluster users, aka sql writers, are reluctant to lower their expectations. 
It would be more convenient to have a way for cluster admins to adjust 
precision for each Admission Control queue according to cluster's resource 
usage(rough world).

Propose:

Add a new query option DEFAULT_NDV_SCALE to change the  default precision 
setting for NDV() 

Implementation:
 # Add a query option in FE
 # If the option is set, use the matching NDV(,P) function instead of NDV(). 

 

 

  was:
Since IMPALA-2658, we can trade memory for more accurate NDV estimation. It is 
fascinating because tests showing error rate within 0.1% while no tremendous 
resource usage rise is found( #registers is 1 << 18). Users may have less 
complaint on computation precision in the future.

However, the road to apply high precision NDV to production environment is 
uneven. 

1) We have to re-write sqls for a large number of historical workloads. Which 
is time costing and is prone to error.

2) Cluster users, aka sql writers, are reluctant to lower their expectations. 
It would be more convenient to have a way for cluster admins to adjust 
precision for each Admission Control queue according to cluster's resource 
usage(rough world).

Propose:

Add a new query option DEFAULT_NDV_SCALE to change the  default precision 
setting for NDV() 

Implementation:
 # Add a query option in FE
 # If the option is set, use the matching NDV(,P) function instead of NDV(). 

 

 


> The ability to adjust NDV's precision with query option
> -------------------------------------------------------
>
>                 Key: IMPALA-10445
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10445
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Fifteen
>            Assignee: Fifteen
>            Priority: Minor
>
> Since IMPALA-2658, we can trade memory for more accurate NDV estimation. It 
> is fascinating because tests showing error rate within 0.1% while no 
> tremendous resource usage rise is found( #registers is 2 << 18). Users may 
> have less complaint on computation precision in the future.
> However, the road to apply high precision NDV to production environment is 
> uneven. 
> 1) We have to re-write sqls for a large number of historical workloads. Which 
> is time costing and is prone to error.
> 2) Cluster users, aka sql writers, are reluctant to lower their expectations. 
> It would be more convenient to have a way for cluster admins to adjust 
> precision for each Admission Control queue according to cluster's resource 
> usage(rough world).
> Propose:
> Add a new query option DEFAULT_NDV_SCALE to change the  default precision 
> setting for NDV() 
> Implementation:
>  # Add a query option in FE
>  # If the option is set, use the matching NDV(,P) function instead of NDV(). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-10445) The ability to adjust NDV's precision with query option

Reply via email to