Idan Sheinberg created DRILL-7629:
-------------------------------------
Summary: Parquet MAP field support missing in recent stable
release (?)
Key: DRILL-7629
URL: https://issues.apache.org/jira/browse/DRILL-7629
Project: Apache Drill
Issue Type: Bug
Components: Storage - Parquet
Affects Versions: 1.17.0
Environment: Drill 1.17
Zulu OpenJDK 8 build 1.8.0_232
Debian Buster 10.3
Kernel version 4.19.98-1
EC c5.2xlarge instances (8 Cores, 16GB RAM)
Reporter: Idan Sheinberg
Encountered this issue when lowering {{planner.slice_target}} (to say, 100) in
order to make drill generate more fragments. Queries then started crashing with
the following error:
{code:java}
Caused by: java.io.IOException: Unable to parse column [`currencyPair`
STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>>
not null]: Line [1], position [29], offending symbol
[@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
at
org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
at
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
... 72 common frames omitted
Caused by:
org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException:
Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable
alternative at input '`bfix`MAP'
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
at
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
at
org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
at
org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
at
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
All files in the queried directory are parquet files that share the same
schema, just to be clear.
Looking into the stack-trace, this seems like an {{antlr}} error. Assuming
{{SchemaParser}} generated from
[thishttps://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
{{g4}} file you can see {{MAP}} support is lacking
Looking around a bit in Jira/Github, I noticed that this issue had already been
fixed in DRILL-7361. I can also confirm that upgrading to the last SNAPSHOT
version (built from source today) resolved the issue.
A few questions:
* Did you intentionally drop parquet MAP field support in Drill for 1.17 as
part of the Antlr lexer refactoring, or was it never present to begin with (I
see 1.16 is not using antlr parsing for parquet schema)?
* Can we safely assume the (newly added) MAP field support will persist from
here on out, or at as part of the 1.18 release?
* Probably not the best place to ask, but as for 1.18, is there a
timeline/plan for that already? or is there a possibility for a hot-fix version
release? would really be happy to work on a stable version rather than a
self-built one.
I'd be able to provide parquet files and guidance towards re-creating this
issue in 1.17, should the need arise.
Thanks in advance!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)