[ 
https://issues.apache.org/jira/browse/PARQUET-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513086#comment-15513086
 ] 

Kristoffer Sjögren commented on PARQUET-697:
--------------------------------------------

Any comments on this?

> ProtoMessageConverter fails for unknown proto fields
> ----------------------------------------------------
>
>                 Key: PARQUET-697
>                 URL: https://issues.apache.org/jira/browse/PARQUET-697
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.8.1
>            Reporter: Kristoffer Sjögren
>
> Hi
> We have Spark application that reads parquet files and turns them into a 
> Protobuf RDD like the code below [1]. However, if the parquet schema contain 
> fields that doesn't exist in protobuf class an 
> IncompatibleSchemaModificationException [2] is thrown. 
> For compatibility reasons it would be nice to make it possible to ignore 
> fields instead of throwing an exception. Maybe as an configuration? The fix 
> for ignoring fields is quite easy, just instantiate an empty 
> PrimitiveConverter instead.
> Cheers,
> -Kristoffer
> [1]
> JobConf conf = new JobConf(ctx.hadoopConfiguration());
> FileInputFormat.setInputPaths(conf, rawPath);
> ProtoReadSupport.setProtobufClass(conf, Msg.class.getName());
> NewHadoopRDD<Void, Msg.Builder> rdd =
>       new NewHadoopRDD(ctx.sc(), ProtoParquetInputFormat.class, void.class, 
> Msg.class, conf);
> rdd.toJavaRDD().foreach(log -> {
>   System.out.println(log._2);
> });
> [2] 
> https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java#L84
> [3] converters[parquetFieldIndex - 1] = new PrimitiveConverter() {};



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to