[
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying updated KUDU-3455:
---------------------------------
Description:
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory.
{code:java}
// java
List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for (int
idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>();
ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate
predicate = predicates.get(column.getName()); List<byte[]>
predicateValues; if (predicate.getType() ==
KuduPredicate.PredicateType.EQUALITY) { predicateValues =
Collections.singletonList(predicate.getLower()); } else {
predicateValues = Arrays.asList(predicate.getInListValues()); } //
For each of the encoded string, replicate it by the number of values in //
equality and in-list predicate. for (PartialRow row : rows) { for
(byte[] predicateValue : predicateValues) { PartialRow newRow = new
PartialRow(row); newRow.setRaw(idx, predicateValue);
newRows.add(newRow); } } rows = newRows; } for
(PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row,
hashSchema); hashBuckets.set(hash); }
{code}
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
was:
[java] Improve space complexity about prune hash partitions for in-list
predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory.
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
> Key: KUDU-3455
> URL: https://issues.apache.org/jira/browse/KUDU-3455
> Project: Kudu
> Issue Type: Task
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
>
> Improve space complexity about prune hash partitions for in-list predicate
> Pruning hash partitions for in-list predicate at java-client, the logic
> codes has a high space complexity, and it may cause java-client out
> of memory.
>
>
> {code:java}
> // java
> List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for
> (int idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>();
> ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate
> predicate = predicates.get(column.getName()); List<byte[]>
> predicateValues; if (predicate.getType() ==
> KuduPredicate.PredicateType.EQUALITY) { predicateValues =
> Collections.singletonList(predicate.getLower()); } else {
> predicateValues = Arrays.asList(predicate.getInListValues()); } //
> For each of the encoded string, replicate it by the number of values in
> // equality and in-list predicate. for (PartialRow row : rows) {
> for (byte[] predicateValue : predicateValues) { PartialRow newRow =
> new PartialRow(row); newRow.setRaw(idx, predicateValue);
> newRows.add(newRow); } } rows = newRows; } for
> (PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row,
> hashSchema); hashBuckets.set(hash); }
> {code}
>
>
>
>
>
>
> This patch fixes the problem and provide a recursive algorithm, that
> uses a method like 'deep first search' to pick all combinations and
> try to release PartialRow objects ASAP.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)