[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

shenxingwuying (Jira) Sat, 11 Mar 2023 01:06:12 -0800


     [ 
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shenxingwuying updated KUDU-3455:
---------------------------------
    Description: 
My partner(Chenbo Lu) has countered an oom problem when in his application 
which uses kudu java client. And he collects some information and do a lot of 
analytics for this problem, I shared his work for this issue.

Application program was killed by OS very frequently because of oom.  When java 
heap memory 8GB(inner heap 5.5GB available), more than 10000 rows  in-list 
predicate would not work(oom happens). The kudu table in his case exists about 
1500 columns.  His scan requests like '{*}select * from profile_wos where id in 
(...){*}'.

 

The problem only happened when KuduScanPredicate is In-List predicate, other 
predicates have no problem.

He found the memory consumption is positive correlation to count of (ids * 
count of columns). In fact, I think it's also a very important key factor that 
the count of every in-list columns' values.

 

When using kudu api to build a scanner, the memory will reach a very high 
watermark and multi-thread will make the problem worse. A picture can explain 
this and prove in-list predicate consumes very high memory.

 

!image-2023-03-11-16-57-16-589.png!

 

 

 

Reduce space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.  And at the same time, PartialRow has many deep copy, it may be 
slow.

 

!image-2023-03-06-17-23-35-119.png!

 

 

So, we need to fix the problem to improve the space complexity and speed 
optimization.

  was:
My partner(Chenbo Lu) has countered an oom problem when in their application 
which uses kudu java client.

And he collects some information and do a lot analytics for this problem, I 
shared his work for this issue.

 

Application program was killed because of oom very frequently.  When Java heap 
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.

This kudu table in his case has about 1500 columns.  His scan like '{*}select * 
from profile_wos where id in (...){*}'.

 

The problem happened KuduScanPredicate is In-List predicate. Other predicate 
has no problem.

He found the memory consumption is positive correlation to count of (ids * 
count of columns). In fact, I think the length every values of every in-list 
columns' values, is also a key variable.

 

When kudu api new scanner the memory reach a very high and multi-thread will 
make the problem worse. An picture can explain this. And prove in-list consumes 
very high memory

 

 

!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!

 

 

 

 

 

Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.  And at the same time, PartialRow has many deep copy, it may be 
slow.

 

!image-2023-03-06-17-23-35-119.png!

 

 

So, we need to fix the problem to improve the space complexity and speed 
optimization.


> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>         Attachments: image-2023-03-06-17-23-35-119.png, 
> image-2023-03-11-16-57-16-589.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in his application 
> which uses kudu java client. And he collects some information and do a lot of 
> analytics for this problem, I shared his work for this issue.
> Application program was killed by OS very frequently because of oom.  When 
> java heap memory 8GB(inner heap 5.5GB available), more than 10000 rows  
> in-list predicate would not work(oom happens). The kudu table in his case 
> exists about 1500 columns.  His scan requests like '{*}select * from 
> profile_wos where id in (...){*}'.
>  
> The problem only happened when KuduScanPredicate is In-List predicate, other 
> predicates have no problem.
> He found the memory consumption is positive correlation to count of (ids * 
> count of columns). In fact, I think it's also a very important key factor 
> that the count of every in-list columns' values.
>  
> When using kudu api to build a scanner, the memory will reach a very high 
> watermark and multi-thread will make the problem worse. A picture can explain 
> this and prove in-list predicate consumes very high memory.
>  
> !image-2023-03-11-16-57-16-589.png!
>  
>  
>  
> Reduce space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.  And at the same time, PartialRow has many deep copy, it may 
> be slow.
>  
> !image-2023-03-06-17-23-35-119.png!
>  
>  
> So, we need to fix the problem to improve the space complexity and speed 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

Reply via email to