[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

shenxingwuying (Jira) Thu, 09 Mar 2023 02:22:07 -0800


     [ 
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shenxingwuying updated KUDU-3455:
---------------------------------
    Description: 
My partner(Chenbo Lu) has countered an oom problem when in their application 
which uses kudu java client.

And he collects some information and do a lot analytics for this problem, I 
shared his work for this issue.

 

Application program was killed because of oom very frequently.  When Java heap 
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.

This kudu table in his case has about 1500 columns.  His scan like '{*}select * 
from profile_wos where id in (...){*}'.

 

The problem happened KuduScanPredicate is In-List predicate. Other predicate 
has no problem.

He found the memory consumption is positive correlation to count of (ids * 
count of columns). In fact, I think the length every values of every in-list 
columns' values, is also a key variable.

 

When kudu api new scanner the memory reach a very high and multi-thread will 
make the problem worse. An picture can explain this. And prove in-list consumes 
very high memory

 

 

!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!

 

 

 

 

 

Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.  And at the same time, PartialRow has many deep copy, it may be 
slow.

 

!image-2023-03-06-17-23-35-119.png!

 

 

So, we need to fix the problem to improve the space complexity and speed 
optimization.

  was:
My partner(Chenbo Lu) has countered an oom problem when in their application 
which uses kudu java client.

And he collects some information and do a lot analytics for this problem, I 
reply his work for this issue.

 

Application program was killed because of oom very frequently.  When Java heap 
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.

This kudu table in his case has about 1500 columns.  His scan like '{*}select * 
from profile_wos where id in (...){*}'.

 

The problem happened KuduScanPredicate is In-List predicate. Other predicate 
has no problem.

He found the memory consumption is positive correlation to count of (ids * 
count of columns). In fact, I think the length every values of every in-list 
columns' values, is also a key variable.

 

When kudu api new scanner the memory reach a very high and multi-thread will 
make the problem 

data_loader 在 scan kudu 时会启动多个线程 scan 不同分片，多个线程同时初始化 scanner 时会在短时间内申请大量内存导致 
OOM。

 

 

 

!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!

 

 

 

 

 

Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.  And at the same time, PartialRow has many deep copy, it may be 
slow.

 

!image-2023-03-06-17-23-35-119.png!

 

 

So, we need to fix the problem to improve the space complexity and speed 
optimization.


> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>         Attachments: image-2023-03-06-17-23-35-119.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in their application 
> which uses kudu java client.
> And he collects some information and do a lot analytics for this problem, I 
> shared his work for this issue.
>  
> Application program was killed because of oom very frequently.  When Java 
> heap memory 8GB(inner heap 5.5GB available), more than 10000 rows would not 
> work.
> This kudu table in his case has about 1500 columns.  His scan like '{*}select 
> * from profile_wos where id in (...){*}'.
>  
> The problem happened KuduScanPredicate is In-List predicate. Other predicate 
> has no problem.
> He found the memory consumption is positive correlation to count of (ids * 
> count of columns). In fact, I think the length every values of every in-list 
> columns' values, is also a key variable.
>  
> When kudu api new scanner the memory reach a very high and multi-thread will 
> make the problem worse. An picture can explain this. And prove in-list 
> consumes very high memory
>  
>  
> !https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
>  
>  
>  
>  
>  
> Improve space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.  And at the same time, PartialRow has many deep copy, it may 
> be slow.
>  
> !image-2023-03-06-17-23-35-119.png!
>  
>  
> So, we need to fix the problem to improve the space complexity and speed 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

Reply via email to