[ 
https://issues.apache.org/jira/browse/ORC-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799721#comment-17799721
 ] 

Alexander Petrossian (PAF) commented on ORC-1554:
-------------------------------------------------


Example:
{code:json}
{data: [{value: 0, value: 1}]} // rowIdx=0
{data: [{value: 2}]} //rowIdx=1
{code}

When currently searching for 1 (and small problem somehow solved) code would try
{code:java}
allowWithNegation(v=[0, 1, 2], rowIdx=1). // v[1]==1
{code}
It would return *true* and record with rowIdx=1 would be returned to client:]
{code:json}
{data: [{value: 2}]} //rowIdx=1
{code}

This record does not contain *value: 1*.

> Filtering by columns, nested in LISTs
> -------------------------------------
>
>                 Key: ORC-1554
>                 URL: https://issues.apache.org/jira/browse/ORC-1554
>             Project: ORC
>          Issue Type: Improvement
>    Affects Versions: 1.9.2
>            Reporter: Alexander Petrossian (PAF)
>            Priority: Major
>
> Currently searchArgument supports fields inside arrays, and that works.
> We use even very nested columns and it works fine, row groups get properly 
> included:
> {noformat}
> data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value
> {noformat}
> Alas, [allowSARGToFilter mechanism|ORC-743] does not handle values inside 
> arrays.
> Two show-stoppers here.
> Small
> https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/OrcFilterContext.java#L80:
>  
> {code:java}
>   static boolean isNull(ColumnVector[] vectorBranch, int idx) throws 
> IllegalArgumentException {
>     for (ColumnVector v : vectorBranch) {
>       if (v instanceof ListColumnVector || v instanceof MapColumnVector) {
>         throw new IllegalArgumentException(String.format(
>           "Found vector: %s in branch. List and Map vectors are not supported 
> in isNull "
>           + "determination", v));
>       }
> {code}
> Big
> https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/impl/filter/LeafFilter.java#L70
> {code:java}
>     ColumnVector[] branch = fc.findColumnVector(colName);
>     ColumnVector v = branch[branch.length - 1];
> ...
>         if (!OrcFilterContext.isNull(branch, rowIdx) &&
>             allowWithNegation(v, rowIdx)) {
> {code}
> Here code is indexing *v* with *rowIdx*, which is totally wrong if v is 
> nested into some LIST (or MAP).
> Row index iterates records.
> But v contains column values, which are potentially fewer or more than table 
> records.
> Their indexing nature is different.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to