Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1214#discussion_r184401600 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java --- @@ -158,25 +159,26 @@ public void close() { } catch (RuntimeException e) { ex = ex == null ? e : ex; } - try { - if (fs != null) { + + for (DrillFileSystem fs : fileSystems) { + try { fs.close(); - fs = null; - } - } catch (IOException e) { + } catch (IOException e) { throw UserException.resourceError(e) - .addContext("Failed to close the Drill file system for " + getName()) - .build(logger); + .addContext("Failed to close the Drill file system for " + getName()) + .build(logger); + } } + if (ex != null) { throw ex; } } @Override public DrillFileSystem newFileSystem(Configuration conf) throws IOException { - Preconditions.checkState(fs == null, "Tried to create a second FileSystem. Can only be called once per OperatorContext"); - fs = new DrillFileSystem(conf, getStats()); + DrillFileSystem fs = new DrillFileSystem(conf, getStats()); --- End diff -- When `AbstractParquetScanBatchCreator.getBatch` method is called, it receives one operator context which is used to allow to create only one file system. It also receives `AbstractParquetRowGroupScan` which contains several row groups. Row groups may belong to different files. For Drill parquet files, we create only one fs and use it for to create readers for each row group. That's why it was fine when operator context allowed to create only one fs. But we needed to adjust it for Hive files. For Hive we need to create fs for each file (since config to each file system is different and created using projection pusher), that's why I had to change operator context to allow more then one file system. I have also introduced `AbstractDrillFileSystemManager` which controls number of file systems created. `ParquetDrillFileSystemManager` creates only one (as was done before). `HiveDrillNativeParquetDrillFileSystemManager` creates fs for each file, so when two row groups belong to the same file, they will get the same fs. But I agree that for tracking fs (i.e. store.parquet.reader.pagereader.async is set to false) this will create mess in calculations. So I suggest the following fix, for Hive we'll always create non tracking fs, for Drill depending on store.parquet.reader.pagereader.async option. Also I'll add checks in operator context to disallow to create more then one tracking fs and to create tracking fs at all when non-tracking is / are already created.
---