[
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080834#comment-16080834
]
Paul Rogers commented on DRILL-4139:
------------------------------------
First impression is that we are serializing bytes all wrong. Bytes are not
characers. Bytes span the full range from 0-255. Since our JSON is Unicode,
encoded as UTF-8, some combination of bytes will be interpreted as multi-byte
characters in Unicode. That is, we are abusing the software.
Correct format for bytes is using a binary format. Most basic:
{code}
'000AFF132D'
{code}
We interpret the above as two hex digits per byte, in left-to-right (lowest to
highest) address order.
The Internet has a number of ways to store binary data in a more compact form.
Base64 (RFC-4648) is popular and has built-in support in Java (the
{{Base64.Encoder}}) class. For example, here is is an example Base64 string:
'Vm9sb2R5bXlyIFZ5c290c2t5aQ=='
Base64 has the advantage that it is designed to be broken into lines, which can
be encoded in JSON as an array.
Note that, for encoding purposes, we only care about the byte order in the
buffer: left-to-right. The meaning of those bytes is unimportant for
serialization. That is, whether the data is big-endian, little-endian,
stream-of-bytes, or stream-of-multi-byte characters is important to the code
that interprets the (decoded) bytes, but not to the serialization format.
> Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types
> -----------------------------------------------------------------
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Volodymyr Vysotskyi
> Attachments: metadata file v3, metadata file with changes
>
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
> ~[drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303)
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244)
> [drill-java-exec-1.3.0.jar:1.3.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)