[
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452708#comment-16452708
]
ASF GitHub Bot commented on DRILL-6331:
---------------------------------------
Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/1214#discussion_r183919198
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java
---
@@ -158,25 +159,26 @@ public void close() {
} catch (RuntimeException e) {
ex = ex == null ? e : ex;
}
- try {
- if (fs != null) {
+
+ for (DrillFileSystem fs : fileSystems) {
+ try {
fs.close();
- fs = null;
- }
- } catch (IOException e) {
+ } catch (IOException e) {
throw UserException.resourceError(e)
- .addContext("Failed to close the Drill file system for " +
getName())
- .build(logger);
+ .addContext("Failed to close the Drill file system for " +
getName())
+ .build(logger);
+ }
}
+
if (ex != null) {
throw ex;
}
}
@Override
public DrillFileSystem newFileSystem(Configuration conf) throws
IOException {
- Preconditions.checkState(fs == null, "Tried to create a second
FileSystem. Can only be called once per OperatorContext");
- fs = new DrillFileSystem(conf, getStats());
+ DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --
I don't get why you need multiple DrillFileSystems per operator context?
The reason for the DrillFileSystem abstraction (and the reason for tying it to
the operator context) is to track the time a (scan) operator was waiting for a
file system call to return. This is reported in the wait time for the operator
in the query profile. For scans this is a critical number as the time spent
waiting for a disk read determines if the query is disk bound.
Associating multiple file system objects with a single operator context
will throw the math out of whack. I think.
> Parquet filter pushdown does not support the native hive reader
> ---------------------------------------------------------------
>
> Key: DRILL-6331
> URL: https://issues.apache.org/jira/browse/DRILL-6331
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Hive
> Affects Versions: 1.13.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Priority: Major
> Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan
> optimizations.
> Hive code had to be refactored to use the same interfaces as
> ParquestGroupScan in order to be exposed to such optimizations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)