[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

leesf (Jira) Sat, 22 Aug 2020 17:19:44 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


leesf updated HUDI-1083:
------------------------
    Fix Version/s: 0.6.1

> Minor optimization in Determining insert bucket location for a given key
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1083
>                 URL: https://issues.apache.org/jira/browse/HUDI-1083
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Writer Core
>            Reporter: sivabalan narayanan
>            Assignee: shenh062326
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

Reply via email to