Idan Sheinberg created DRILL-7629:
-------------------------------------

             Summary: Parquet MAP field support missing in recent stable 
release (?)
                 Key: DRILL-7629
                 URL: https://issues.apache.org/jira/browse/DRILL-7629
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.17.0
         Environment: Drill 1.17
Zulu OpenJDK 8 build 1.8.0_232
Debian Buster 10.3
Kernel version 4.19.98-1
EC c5.2xlarge instances (8 Cores, 16GB RAM)
            Reporter: Idan Sheinberg


Encountered this issue when lowering {{planner.slice_target}}  (to say, 100) in 
order to make drill generate more fragments. Queries then started crashing with 
the following error:
{code:java}
Caused by: java.io.IOException: Unable to parse column [`currencyPair` 
STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>> 
not null]: Line [1], position [29], offending symbol 
[@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
        at 
org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
        at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
        at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
        ... 72 common frames omitted
Caused by: 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException: 
Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable 
alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
        at 
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
        at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
All files in the queried directory are parquet files that share the same 
schema, just to be clear.

Looking into the stack-trace, this seems like an {{antlr}} error. Assuming 
{{SchemaParser}} generated from 
[thishttps://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
 {{g4}} file you can see {{MAP}} support is lacking

Looking around a bit in Jira/Github, I noticed that this issue had already been 
fixed in DRILL-7361. I can also confirm that upgrading to the last SNAPSHOT 
version (built from source today) resolved the issue.

A few questions:

 * Did you intentionally drop parquet MAP field support in Drill for 1.17 as 
part of the Antlr lexer refactoring, or was it never present to begin with (I 
see 1.16 is not using antlr parsing for parquet schema)?

 * Can we safely assume the (newly added) MAP field support will persist from 
here on out, or at as part of the 1.18 release?

 * Probably not the best place to ask, but as for 1.18, is there a 
timeline/plan for that already? or is there a possibility for a hot-fix version 
release? would really be happy to work on a stable version rather than a 
self-built one.

I'd be able to provide parquet files and guidance towards re-creating this 
issue in 1.17, should the need arise.

Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to