[
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying updated KUDU-3455:
---------------------------------
Description:
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory. And at the same time, PartialRow has many deep copy, it may be
slow.
!image-2023-03-06-17-23-35-119.png!
So, we need to fix the problem to improve the
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
was:
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory.
{code:java}
// java
List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for (int
idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>();
ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate
predicate = predicates.get(column.getName()); List<byte[]>
predicateValues; if (predicate.getType() ==
KuduPredicate.PredicateType.EQUALITY) { predicateValues =
Collections.singletonList(predicate.getLower()); } else {
predicateValues = Arrays.asList(predicate.getInListValues()); } //
For each of the encoded string, replicate it by the number of values in //
equality and in-list predicate. for (PartialRow row : rows) { for
(byte[] predicateValue : predicateValues) { PartialRow newRow = new
PartialRow(row); newRow.setRaw(idx, predicateValue);
newRows.add(newRow); } } rows = newRows; } for
(PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row,
hashSchema); hashBuckets.set(hash); }
{code}
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
> Key: KUDU-3455
> URL: https://issues.apache.org/jira/browse/KUDU-3455
> Project: Kudu
> Issue Type: Task
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
> Attachments: image-2023-03-06-17-23-35-119.png
>
>
> Improve space complexity about prune hash partitions for in-list predicate
> Pruning hash partitions for in-list predicate at java-client, the logic
> codes has a high space complexity, and it may cause java-client out
> of memory. And at the same time, PartialRow has many deep copy, it may
> be slow.
>
> !image-2023-03-06-17-23-35-119.png!
>
>
> So, we need to fix the problem to improve the
> This patch fixes the problem and provide a recursive algorithm, that
> uses a method like 'deep first search' to pick all combinations and
> try to release PartialRow objects ASAP.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)