[ 
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying updated KUDU-3455:
---------------------------------
    Attachment: image-2023-03-11-16-57-16-589.png

> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>         Attachments: image-2023-03-06-17-23-35-119.png, 
> image-2023-03-11-16-57-16-589.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in their application 
> which uses kudu java client.
> And he collects some information and do a lot analytics for this problem, I 
> shared his work for this issue.
>  
> Application program was killed because of oom very frequently.  When Java 
> heap memory 8GB(inner heap 5.5GB available), more than 10000 rows would not 
> work.
> This kudu table in his case has about 1500 columns.  His scan like '{*}select 
> * from profile_wos where id in (...){*}'.
>  
> The problem happened KuduScanPredicate is In-List predicate. Other predicate 
> has no problem.
> He found the memory consumption is positive correlation to count of (ids * 
> count of columns). In fact, I think the length every values of every in-list 
> columns' values, is also a key variable.
>  
> When kudu api new scanner the memory reach a very high and multi-thread will 
> make the problem worse. An picture can explain this. And prove in-list 
> consumes very high memory
>  
>  
> !https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
>  
>  
>  
>  
>  
> Improve space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.  And at the same time, PartialRow has many deep copy, it may 
> be slow.
>  
> !image-2023-03-06-17-23-35-119.png!
>  
>  
> So, we need to fix the problem to improve the space complexity and speed 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to