[ https://issues.apache.org/jira/browse/PARQUET-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513086#comment-15513086 ]
Kristoffer Sjögren commented on PARQUET-697: -------------------------------------------- Any comments on this? > ProtoMessageConverter fails for unknown proto fields > ---------------------------------------------------- > > Key: PARQUET-697 > URL: https://issues.apache.org/jira/browse/PARQUET-697 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.8.1 > Reporter: Kristoffer Sjögren > > Hi > We have Spark application that reads parquet files and turns them into a > Protobuf RDD like the code below [1]. However, if the parquet schema contain > fields that doesn't exist in protobuf class an > IncompatibleSchemaModificationException [2] is thrown. > For compatibility reasons it would be nice to make it possible to ignore > fields instead of throwing an exception. Maybe as an configuration? The fix > for ignoring fields is quite easy, just instantiate an empty > PrimitiveConverter instead. > Cheers, > -Kristoffer > [1] > JobConf conf = new JobConf(ctx.hadoopConfiguration()); > FileInputFormat.setInputPaths(conf, rawPath); > ProtoReadSupport.setProtobufClass(conf, Msg.class.getName()); > NewHadoopRDD<Void, Msg.Builder> rdd = > new NewHadoopRDD(ctx.sc(), ProtoParquetInputFormat.class, void.class, > Msg.class, conf); > rdd.toJavaRDD().foreach(log -> { > System.out.println(log._2); > }); > [2] > https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java#L84 > [3] converters[parquetFieldIndex - 1] = new PrimitiveConverter() {}; -- This message was sent by Atlassian JIRA (v6.3.4#6332)