[ https://issues.apache.org/jira/browse/SPARK-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680418#comment-14680418 ]
Cheng Lian commented on SPARK-9340: ----------------------------------- Thanks for the clarification. In [PR #8070|https://github.com/apache/spark/pull/8070] I just try to do the "required list of required elements" conversion. I understand that cleaning up all those compatibility stuff can be super time consuming, and making sure the most common scenarios work first totally makes sense. I'm so glad that all the backwards-compatibility rules had already been figured out there when I started to investigate these issues. These rules definitely saved my world! > ParquetTypeConverter incorrectly handling of repeated types results in schema > mismatch > -------------------------------------------------------------------------------------- > > Key: SPARK-9340 > URL: https://issues.apache.org/jira/browse/SPARK-9340 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Reporter: Damian Guy > Attachments: ParquetTypesConverterTest.scala > > > The way ParquetTypesConverter handles primitive repeated types results in an > incompatible schema being used for querying data. For example, given a schema > like so: > message root { > repeated int32 repeated_field; > } > Spark produces a read schema like: > message root { > optional int32 repeated_field; > } > These are incompatible and all attempts to read fail. > In ParquetTypesConverter.toDataType: > if (parquetType.isPrimitive) { > toPrimitiveDataType(parquetType.asPrimitiveType, isBinaryAsString, > isInt96AsTimestamp) > } else {...} > The if condition should also have > !parquetType.isRepetition(Repetition.REPEATED) > > And then this case will need to be handled in the else -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org