Paul Rogers created DRILL-5075:
----------------------------------
Summary: Tests complain about Parquet metadata parse errors in
Drill-created files
Key: DRILL-5075
URL: https://issues.apache.org/jira/browse/DRILL-5075
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor
The test {{TestParquetWriter.testAllScalarTypes}} seems to create a Parquet
file, then read it using the "new" Parquet reader. However, the test throws the
following assertion (though the test still succeeds.)
Note that the exception does _not_ occur if we run the single test function by
itself. It only occurs when run as part of the entire test class, suggesting an
interaction between tests.
When run stand-alone, another behavior occurs. When the test is complete, and
the Drillbit shuts down, only then does Parquet log a bunch of
"ColumnChunkPageWriteStore: written" messages followed by:
{code}
WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because
created_by is null or empty! See PARQUET-251 and PARQUET-297
{code}
Are we leaving a file open that is getting flushed only on shut-down?
Full error when the test runs in the entire suite:
{code}
PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because
created_by could not be parsed (see PARQUET-251): parquet-mr
org.apache.parquet.VersionParser$VersionParseException: Could not parse
created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:66)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:264)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:568)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:545)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:455)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
at
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:381)
at
org.apache.drill.exec.store.parquet.Metadata.access$0(Metadata.java:379)
at
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:316)
at
org.apache.drill.exec.store.parquet.Metadata$MetadataGatherer.runInner(Metadata.java:1)
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:56)
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:122)
at
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:278)
at
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:257)
at
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:242)
at
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:118)
at
org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:733)
at
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:230)
at
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:190)
at
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:169)
at
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:1)
at
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:145)
at
org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:103)
at
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
at
org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:65)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
at
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:404)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)