[ https://issues.apache.org/jira/browse/PARQUET-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian resolved PARQUET-194. -------------------------------- Resolution: Not a Problem Thanks [~rdblue], I'm closing this. > Provide callback to allow user defined key-value metadata merging strategy > -------------------------------------------------------------------------- > > Key: PARQUET-194 > URL: https://issues.apache.org/jira/browse/PARQUET-194 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.6.0 > Reporter: Cheng Lian > > When merging footers, Parquet doesn't know how to merge conflicting user > defined key-value metadata entries, and simply throws. It would be better to > provide callbacks to let users define metadata merging strategies. > For example, in Spark SQL, we store our own schema information in Parquet > files as key-value metadata (similar to parquet-avro). While trying to add > schema merging support for reading Parquet files with different but > compatible schemas, {{InitContext.getMergedKeyValueMetaData}} throws because > we have different Spark SQL schemas stored in different Parquet data files. > Thus, we have to overwrite {{ParquetInputFormat}} and merge the schema within > {{getSplits}}, which is kinda hacky and inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)