[
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying updated KUDU-3455:
---------------------------------
Description:
My partner(Chenbo Lu) has countered an oom problem when in his application
which uses kudu java client. And he collects some information and do a lot of
analytics for this problem, I shared his work for this issue.
Application program was killed by OS very frequently because of oom. When java
heap memory 8GB(inner heap 5.5GB available), more than 10000 rows in-list
predicate would not work(oom happens). The kudu table in his case exists about
1500 columns. His scan requests like '{*}select * from profile_wos where id in
(...){*}'.
The problem only happened when KuduScanPredicate is In-List predicate, other
predicates have no problem.
He found the memory consumption is positive correlation to count of (ids *
count of columns). In fact, I think it's also a very important key factor that
the count of every in-list columns' values.
When using kudu api to build a scanner, the memory will reach a very high
watermark and multi-thread will make the problem worse. A picture can explain
this and prove in-list predicate consumes very high memory.
!image-2023-03-11-16-57-16-589.png!
Reduce space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory. And at the same time, PartialRow has many deep copy, it may be
slow.
!image-2023-03-06-17-23-35-119.png!
So, we need to fix the problem to improve the space complexity and speed
optimization.
was:
My partner(Chenbo Lu) has countered an oom problem when in their application
which uses kudu java client.
And he collects some information and do a lot analytics for this problem, I
shared his work for this issue.
Application program was killed because of oom very frequently. When Java heap
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.
This kudu table in his case has about 1500 columns. His scan like '{*}select *
from profile_wos where id in (...){*}'.
The problem happened KuduScanPredicate is In-List predicate. Other predicate
has no problem.
He found the memory consumption is positive correlation to count of (ids *
count of columns). In fact, I think the length every values of every in-list
columns' values, is also a key variable.
When kudu api new scanner the memory reach a very high and multi-thread will
make the problem worse. An picture can explain this. And prove in-list consumes
very high memory
!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory. And at the same time, PartialRow has many deep copy, it may be
slow.
!image-2023-03-06-17-23-35-119.png!
So, we need to fix the problem to improve the space complexity and speed
optimization.
> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
> Key: KUDU-3455
> URL: https://issues.apache.org/jira/browse/KUDU-3455
> Project: Kudu
> Issue Type: Task
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
> Attachments: image-2023-03-06-17-23-35-119.png,
> image-2023-03-11-16-57-16-589.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in his application
> which uses kudu java client. And he collects some information and do a lot of
> analytics for this problem, I shared his work for this issue.
> Application program was killed by OS very frequently because of oom. When
> java heap memory 8GB(inner heap 5.5GB available), more than 10000 rows
> in-list predicate would not work(oom happens). The kudu table in his case
> exists about 1500 columns. His scan requests like '{*}select * from
> profile_wos where id in (...){*}'.
>
> The problem only happened when KuduScanPredicate is In-List predicate, other
> predicates have no problem.
> He found the memory consumption is positive correlation to count of (ids *
> count of columns). In fact, I think it's also a very important key factor
> that the count of every in-list columns' values.
>
> When using kudu api to build a scanner, the memory will reach a very high
> watermark and multi-thread will make the problem worse. A picture can explain
> this and prove in-list predicate consumes very high memory.
>
> !image-2023-03-11-16-57-16-589.png!
>
>
>
> Reduce space complexity about prune hash partitions for in-list predicate
> Pruning hash partitions for in-list predicate at java-client, the logic
> codes has a high space complexity, and it may cause java-client out
> of memory. And at the same time, PartialRow has many deep copy, it may
> be slow.
>
> !image-2023-03-06-17-23-35-119.png!
>
>
> So, we need to fix the problem to improve the space complexity and speed
> optimization.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)