Hi, I've hit an issue recently with the parquet schema produced from parquet-protobuf. As it is not consistent with what Avro and Thrift produce, a downstream app, Spark SQL, couldn't read files containing repeated types. We've since had this fixed here: https://issues.apache.org/jira/browse/SPARK-9340
While it is possible for downstream users of parquet to handle the incompatibilities between the various different input formats it seems a shame that parquet doesn't produce a consistent schema across all formats. This would make working with parquet much simpler as the rules for converting to/from a list etc would always be the same. It would make for simpler, less error prone, code. Besides, i thought this was one of the reasons for using parquet... I submitted a pull request that addresses the issue we have been facing: https://github.com/apache/parquet-mr/pull/253 Is there any reason why you wouldn't want to have a consistent parquet representation? Thanks, Damian
