[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

ASF GitHub Bot (JIRA) Tue, 24 Apr 2018 01:46:50 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449481#comment-16449481
 ]


ASF GitHub Bot commented on DRILL-6331:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1214#discussion_r183633517
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ColumnExplorer.java ---
    @@ -156,43 +157,74 @@ public static boolean isPartitionColumn(String 
partitionDesignator, String path)
       }
     
       /**
    -   * Compares selection root and actual file path to determine partition 
columns values.
    -   * Adds implicit file columns according to columns list.
    +   * Creates map with implicit columns where key is column name, value is 
columns actual value.
    +   * This map contains partition and implicit file columns (if requested).
    +   * Partition columns names are formed based in partition designator and 
value index.
        *
    -   * @return map with columns names as keys and their values
    +   * @param filePath file path, used to populate file implicit columns
    +   * @param partitionValues list of partition values
    +   * @param includeFileImplicitColumns if file implicit columns should be 
included into the result
    +   * @return implicit columns map
        */
    -  public Map<String, String> populateImplicitColumns(FileWork work, String 
selectionRoot) {
    -    return populateImplicitColumns(work.getPath(), selectionRoot);
    -  }
    +  public Map<String, String> populateImplicitColumns(String filePath,
    +                                                     List<String> 
partitionValues,
    +                                                     boolean 
includeFileImplicitColumns) {
    +    Map<String, String> implicitValues = new LinkedHashMap<>();
     
    -  /**
    -   * Compares selection root and actual file path to determine partition 
columns values.
    -   * Adds implicit file columns according to columns list.
    -   *
    -   * @return map with columns names as keys and their values
    -   */
    -  public Map<String, String> populateImplicitColumns(String filePath, 
String selectionRoot) {
    -    Map<String, String> implicitValues = Maps.newLinkedHashMap();
    -    if (selectionRoot != null) {
    -      String[] r = Path.getPathWithoutSchemeAndAuthority(new 
Path(selectionRoot)).toString().split("/");
    -      Path path = Path.getPathWithoutSchemeAndAuthority(new 
Path(filePath));
    -      String[] p = path.toString().split("/");
    -      if (p.length > r.length) {
    -        String[] q = ArrayUtils.subarray(p, r.length, p.length - 1);
    -        for (int a = 0; a < q.length; a++) {
    -          if (isStarQuery || selectedPartitionColumns.contains(a)) {
    -            implicitValues.put(partitionDesignator + a, q[a]);
    -          }
    -        }
    +    for(int i = 0; i < partitionValues.size(); i++) {
    --- End diff --
    
    `for (`


> Parquet filter pushdown does not support the native hive reader
> ---------------------------------------------------------------
>
>                 Key: DRILL-6331
>                 URL: https://issues.apache.org/jira/browse/DRILL-6331
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

Reply via email to