[ 
https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying updated KUDU-3455:
---------------------------------
    Description: 
Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.  And at the same time, PartialRow has many deep copy, it may be 
slow.

 

!image-2023-03-06-17-23-35-119.png!

 

 

So, we need to fix the problem to improve the 

    This patch fixes the problem and provide a recursive algorithm, that
    uses a method like 'deep first search' to pick all combinations and
    try to release PartialRow objects ASAP.

  was:
Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.

 

 
{code:java}
// java
    List<PartialRow> rows = Arrays.asList(schema.newPartialRow());    for (int 
idx : columnIdxs) {      List<PartialRow> newRows = new ArrayList<>();      
ColumnSchema column = schema.getColumnByIndex(idx);      KuduPredicate 
predicate = predicates.get(column.getName());      List<byte[]> 
predicateValues;      if (predicate.getType() == 
KuduPredicate.PredicateType.EQUALITY) {        predicateValues = 
Collections.singletonList(predicate.getLower());      } else {        
predicateValues = Arrays.asList(predicate.getInListValues());      }      // 
For each of the encoded string, replicate it by the number of values in      // 
equality and in-list predicate.      for (PartialRow row : rows) {        for 
(byte[] predicateValue : predicateValues) {          PartialRow newRow = new 
PartialRow(row);          newRow.setRaw(idx, predicateValue);          
newRows.add(newRow);        }      }      rows = newRows;    }    for 
(PartialRow row : rows) {      int hash = KeyEncoder.getHashBucket(row, 
hashSchema);      hashBuckets.set(hash);    }

{code}
 

 

 

 

 

 

    This patch fixes the problem and provide a recursive algorithm, that
    uses a method like 'deep first search' to pick all combinations and
    try to release PartialRow objects ASAP.


> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>         Attachments: image-2023-03-06-17-23-35-119.png
>
>
> Improve space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.  And at the same time, PartialRow has many deep copy, it may 
> be slow.
>  
> !image-2023-03-06-17-23-35-119.png!
>  
>  
> So, we need to fix the problem to improve the 
>     This patch fixes the problem and provide a recursive algorithm, that
>     uses a method like 'deep first search' to pick all combinations and
>     try to release PartialRow objects ASAP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to