[
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146652#comment-15146652
]
ASF GitHub Bot commented on DRILL-4287:
---------------------------------------
GitHub user amansinha100 opened a pull request:
https://github.com/apache/drill/pull/376
DRILL-4287: During initial DrillTable creation don't read the metadat…
…a cache file; instead do it during ParquetGroupScan.
Maintain state in FileSelection to keep track of whether certain operations
have been done on that selection.
Remove ParquetFileSelection since its only purpose was to carry the
metadata cache information which is not needed anymore.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFileSelection.java
Resolve issues after rebasing:
1) JsonIgnore fileSelection in ParquetGroupScan
2) FileSysemPartitionDescriptor change.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
DRILL-4287: Address code review comments and follow-up changes after
rebasing:
- In FileSelection: updated call to the Stopwatch, set all flags
appropriately in minusDirectories(), modify supportDirPruning()
- In ParquetGroupScan: Simplify directory checking in constructor, set the
parquetTableMetadata field after reading metadata cache.
- Fix unit tests to use an alias for the reserved dir<N> columns as
partition-by columns.
More follow-up changes:
- Get rid of fileSelection attribute in ParquetGroupScan
- Initialize entries after expanding the selection when metadata cache is
used
- For non-metadata cache, don't do any expansion in the constructor; let
init() handle it
- In FileSystemPartitionDescriptor, the createPartitionSublists is
modified to check for parquet scan
When reading from metadata cache , ensure selection root does contain the
scheme and authority prefix. Minor refactoring.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/amansinha100/incubator-drill DRILL-4287-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/376.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #376
----
commit 79508e3d08baa49ec2d2d7480dd278e77b99e527
Author: Aman Sinha <[email protected]>
Date: 2016-01-18T18:26:59Z
DRILL-4287: During initial DrillTable creation don't read the metadata
cache file; instead do it during ParquetGroupScan.
Maintain state in FileSelection to keep track of whether certain operations
have been done on that selection.
Remove ParquetFileSelection since its only purpose was to carry the
metadata cache information which is not needed anymore.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFileSelection.java
Resolve issues after rebasing:
1) JsonIgnore fileSelection in ParquetGroupScan
2) FileSysemPartitionDescriptor change.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java
DRILL-4287: Address code review comments and follow-up changes after
rebasing:
- In FileSelection: updated call to the Stopwatch, set all flags
appropriately in minusDirectories(), modify supportDirPruning()
- In ParquetGroupScan: Simplify directory checking in constructor, set the
parquetTableMetadata field after reading metadata cache.
- Fix unit tests to use an alias for the reserved dir<N> columns as
partition-by columns.
More follow-up changes:
- Get rid of fileSelection attribute in ParquetGroupScan
- Initialize entries after expanding the selection when metadata cache is
used
- For non-metadata cache, don't do any expansion in the constructor; let
init() handle it
- In FileSystemPartitionDescriptor, the createPartitionSublists is
modified to check for parquet scan
When reading from metadata cache , ensure selection root does contain the
scheme and authority prefix. Minor refactoring.
----
> Do lazy reading of parquet metadata cache file
> ----------------------------------------------
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.4.0
> Reporter: Aman Sinha
> Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of
> the DrillTable (as part of ParquetFormatMatcher.isReadable()). This is not
> desirable from performance standpoint since there are scenarios where we want
> to do some up-front optimizations - e.g. directory-based partition pruning
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such
> situations it is better to do lazy reading of the metadata cache file.
> This is a placeholder to perform such delayed reading since it is needed for
> the aforementioned optimizations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)