[
https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Magyar resolved HIVE-22960.
----------------------------------
Resolution: Won't Fix
> Approximate TopN Key Operator
> -----------------------------
>
> Key: HIVE-22960
> URL: https://issues.apache.org/jira/browse/HIVE-22960
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Attila Magyar
> Assignee: Attila Magyar
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
>
>
> "Different from other operators, top n operator demonstrates the notable
> “long tail” characteristics which makes it distinct from other operators like
> join, group by and etc. will saturate very quickly. Update is pretty frequent
> at the beginning and then diverges to a very slow update frequently.
> The approximation can be implemented in two ways: one way is to stop the
> array/heap update after certain percentage of the data is been read, for
> example, 10% or 20%, if we know the table size. The other way is to set a
> frequency threshold of the array/heap update. After the threshold is met,
> then stop the top n processing"
> [~rzhappy]
> !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468!
> Y: number of updates in every 100msec
--
This message was sent by Atlassian Jira
(v8.3.4#803005)