[
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080914#comment-16080914
]
Paul Rogers commented on DRILL-4139:
------------------------------------
If we change the encoding of binary data, we must change the version of the
Parquet metadata file.
Yes, this means that old Drillbits cannot read new files. This restriction is
fine.
The problem occurs when we don't change the version, when an old Drillbit reads
a new file, and gets improper results because the file format changes.
An even worse problem is when a new Drillbit reads an old file and produces
wrong results.
So, IMHO, the rules for a Drillbit at version x that writes metadata file
version m are:
* Writes and reads version m.
* Reads version m-1 (at least. Maybe read versions from the last year.)
* Fails (ignores) files of version m+1 or later.
My suggestion for the last case is:
* Read the file header, detect a newer version, an long the error condition.
("Drillbit version x cannot read metadata file version v.")
* Ignore the file as if it does not exist. This will mean queries run slower.
Log a warning. ("Ignoring unsupported metadata file version v; query
performance may be slow.")
Running slow is better than failing.
Note that the Drillbit should *not* attempt to delete the new file and rebuild
an old one. Here is why.
Suppose a user tries to do a rolling upgrade. (Drill doesn't support that now,
but eventually it will.) A newer Drillbit creates a metadata file. If the old
one deleted it and created an old one; we'd have an oscillation: new Drillbit
deletes the old file, old one deletes the new file, and so on.
By leaving the new file, we avoid the oscillation. (Even better, later, we
should be able to tell the newer Drillbit to continue to create older files
until all Drillbits are upgraded. But, that is a topic for later.)
> Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types
> -----------------------------------------------------------------
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Volodymyr Vysotskyi
> Attachments: metadata file v3, metadata file with changes
>
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)