[ 
https://issues.apache.org/jira/browse/ORC-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned ORC-323:
---------------------------------

    Assignee: Ashish Sharma

> Predicate push down for nested fields
> -------------------------------------
>
>                 Key: ORC-323
>                 URL: https://issues.apache.org/jira/browse/ORC-323
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Minor
>
> *1. Predicate Pushdown For Nested field*
> *1.1 Objective*
> In the ORC(Optimized Row Columnar) all the primitive type column consist of 
> index. Predicate refer to the column name in where clause and pushdown mean 
> skipping rows groups, strips and block while reading by comparing the meta 
> store in the strips. Meta consist of max, sum ,min value present in the given 
> column. 
> Currently predicate pushdown only work for top level column of the schema. 
> Extending the Predicate Pushdown for nested structure in hive.  
> *1.2 Current state *- 
>  
> *1.2.1 Schema*
> struct<int1:int, complex:struct<int2:int,String1:string>>
>  
> *1.2.2 Search Argument  *
> SearchArgument sarg = SearchArgumentFactory.newBuilder()
>        .startAnd()
>        .startNot()
>        .lessThan(“int2", PredicateLeaf.Type.LONG, 300000L)
>        .end()
>        .lessThan("int2", PredicateLeaf.Type.LONG, 600000L)
>        .end()
>        .build();
>  
>  
>  
>  
> *1.2.3 Pushdown Predicate not supported in Nested field in ORC*
>  
> private boolean[] populatePpdSafeConversion() {
>     if (fileSchema == null || readerSchema == null || readerFileTypes == 
> null) {
>       return null;
>     }
>     boolean[] result = new boolean[readerSchema.getMaximumId() + 1];
>     boolean safePpd = validatePPDConversion(fileSchema, readerSchema);
>     result[readerSchema.getId()] = safePpd;
>     List<TypeDescription> children = readerSchema.getChildren();
>     if (children != null) {
>       for (TypeDescription child : children) {
>         TypeDescription fileType = getFileType(child.getId());
>         safePpd = validatePPDConversion(fileType, child);
>         result[child.getId()] = safePpd;
>       }
>     }
>     return result;
>   }
> In populatePpdSafeConversion() this function only check the conversion 
> validation for top level field. So validation of nested field search argument 
> fails.
> static int findColumns(SchemaEvolution evolution,
>                          String columnName) {
>     TypeDescription readerSchema = evolution.getReaderBaseSchema();
>     List<String> fieldNames = readerSchema.getFieldNames();
>     List<TypeDescription> children = readerSchema.getChildren();
>     for (int i = 0; i < fieldNames.size(); ++i) {
>       if (columnName.equals(fieldNames.get(i))) {
>         TypeDescription result = evolution.getFileType(children.get(i));
>         return result == null ? -1 : result.getId();
>       }
>     }
>     return -1;
>   }
> In findColumns() all the only top level column is referred. “Int2” is nested 
> column due to which  “-1” is return instead of index of “int2”.
> *1.2.4 Result -*
> PPD is not working for int2 field in the search argument.
> *1.3 Expected state - *
> *1.3.1 Schema*
> struct<int1:int, complex:struct<int2:int,String1:string>>
>  
> *1.3.2 Query*
> Replacing Column name in PredicateLeaf with fully qualified column path.
>  
> SearchArgument sarg = SearchArgumentFactory.newBuilder()
>        .startAnd()
>        .startNot()
>        .lessThan(“complex.int2", PredicateLeaf.Type.LONG, 300000L)
>        .end()
>        .lessThan("complex.int2", PredicateLeaf.Type.LONG, 600000L)
>        .end()
>        .build();
>  
> *1.3.3 Pushdown Predicate support in Nested field*
> https://github.com/apache/orc/pull/232
> *1.3.4 Result*
> PPD is working for complex.int2 field in the search argument.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to