[
https://issues.apache.org/jira/browse/DRILL-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva updated DRILL-7629:
------------------------------------
Fix Version/s: 1.18.0
> Parquet MAP field support missing in recent stable release (?)
> --------------------------------------------------------------
>
> Key: DRILL-7629
> URL: https://issues.apache.org/jira/browse/DRILL-7629
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.17.0
> Environment: Drill 1.17
> Zulu OpenJDK 8 build 1.8.0_232
> Debian Buster 10.3
> Kernel version 4.19.98-1
> EC c5.2xlarge instances (8 Cores, 16GB RAM)
> Reporter: Idan Sheinberg
> Priority: Major
> Fix For: 1.18.0
>
>
> Encountered this issue when lowering {{planner.slice_target}} (to say, 100)
> in order to make drill generate more fragments. Queries then started crashing
> with the following error:
> {code:java}
> Caused by: java.io.IOException: Unable to parse column [`currencyPair`
> STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>>
> not null]: Line [1], position [29], offending symbol
> [@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
> at
> org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
> at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
> at
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
> ... 72 common frames omitted
> Caused by:
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException:
> Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no
> viable alternative at input '`bfix`MAP'
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
> at
> org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
> at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
> at
> org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
> at
> org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
> at
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
> All files in the queried directory are parquet files that share the same
> schema, just to be clear.
> Looking into the stack-trace, this seems like an {{antlr}} error. Assuming
> {{SchemaParser}} generated from
> [this|https://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
> {{g4}} file you can see {{MAP}} support is lacking
> Looking around a bit in Jira/Github, I noticed that this issue had already
> been fixed in DRILL-7361. I can also confirm that upgrading to the last
> SNAPSHOT version (built from source today) resolved the issue.
> A few questions:
> * Did you intentionally drop parquet MAP field support in Drill for 1.17 as
> part of the Antlr lexer refactoring, or was it never present to begin with (I
> see 1.16 is not using antlr parsing for parquet schema)?
> * Can we safely assume the (newly added) MAP field support will persist from
> here on out, or at as part of the 1.18 release?
> * Probably not the best place to ask, but as for 1.18, is there a
> timeline/plan for that already? or is there a possibility for a hot-fix
> version release? would really be happy to work on a stable version rather
> than a self-built one.
> I'd be able to provide parquet files and guidance towards re-creating this
> issue in 1.17, should the need arise.
> Thanks in advance!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)