[
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying updated KUDU-3455:
---------------------------------
Description:
My partner(Chenbo Lu) has countered an oom problem when in their application
which uses kudu java client.
And he collects some information and do a lot analytics for this problem, I
shared his work for this issue.
Application program was killed because of oom very frequently. When Java heap
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.
This kudu table in his case has about 1500 columns. His scan like '{*}select *
from profile_wos where id in (...){*}'.
The problem happened KuduScanPredicate is In-List predicate. Other predicate
has no problem.
He found the memory consumption is positive correlation to count of (ids *
count of columns). In fact, I think the length every values of every in-list
columns' values, is also a key variable.
When kudu api new scanner the memory reach a very high and multi-thread will
make the problem worse. An picture can explain this. And prove in-list consumes
very high memory
!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory. And at the same time, PartialRow has many deep copy, it may be
slow.
!image-2023-03-06-17-23-35-119.png!
So, we need to fix the problem to improve the space complexity and speed
optimization.
was:
My partner(Chenbo Lu) has countered an oom problem when in their application
which uses kudu java client.
And he collects some information and do a lot analytics for this problem, I
reply his work for this issue.
Application program was killed because of oom very frequently. When Java heap
memory 8GB(inner heap 5.5GB available), more than 10000 rows would not work.
This kudu table in his case has about 1500 columns. His scan like '{*}select *
from profile_wos where id in (...){*}'.
The problem happened KuduScanPredicate is In-List predicate. Other predicate
has no problem.
He found the memory consumption is positive correlation to count of (ids *
count of columns). In fact, I think the length every values of every in-list
columns' values, is also a key variable.
When kudu api new scanner the memory reach a very high and multi-thread will
make the problem
data_loader 在 scan kudu 时会启动多个线程 scan 不同分片,多个线程同时初始化 scanner 时会在短时间内申请大量内存导致
OOM。
!https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory. And at the same time, PartialRow has many deep copy, it may be
slow.
!image-2023-03-06-17-23-35-119.png!
So, we need to fix the problem to improve the space complexity and speed
optimization.
> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
> Key: KUDU-3455
> URL: https://issues.apache.org/jira/browse/KUDU-3455
> Project: Kudu
> Issue Type: Task
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
> Attachments: image-2023-03-06-17-23-35-119.png
>
>
> My partner(Chenbo Lu) has countered an oom problem when in their application
> which uses kudu java client.
> And he collects some information and do a lot analytics for this problem, I
> shared his work for this issue.
>
> Application program was killed because of oom very frequently. When Java
> heap memory 8GB(inner heap 5.5GB available), more than 10000 rows would not
> work.
> This kudu table in his case has about 1500 columns. His scan like '{*}select
> * from profile_wos where id in (...){*}'.
>
> The problem happened KuduScanPredicate is In-List predicate. Other predicate
> has no problem.
> He found the memory consumption is positive correlation to count of (ids *
> count of columns). In fact, I think the length every values of every in-list
> columns' values, is also a key variable.
>
> When kudu api new scanner the memory reach a very high and multi-thread will
> make the problem worse. An picture can explain this. And prove in-list
> consumes very high memory
>
>
> !https://doc.sensorsdata.cn/download/attachments/360231828/image2023-2-7_15-56-12.png?version=1&modificationDate=1675756573000&api=v2!
>
>
>
>
>
> Improve space complexity about prune hash partitions for in-list predicate
> Pruning hash partitions for in-list predicate at java-client, the logic
> codes has a high space complexity, and it may cause java-client out
> of memory. And at the same time, PartialRow has many deep copy, it may
> be slow.
>
> !image-2023-03-06-17-23-35-119.png!
>
>
> So, we need to fix the problem to improve the space complexity and speed
> optimization.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)