[ 
https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986645#comment-14986645
 ] 

Wu Xiang commented on PARQUET-388:
----------------------------------


hi Reuben,

Yes,  I agree that current record could be either Message or Message.Builder. 
But the problem is when I use it, the 'buildBefore' flag would be always false, 
which means the Builder instance is always returned.

More details,
I'm implementing a wrapper for cascading to read/write parquet|protobuf data. 
And the default ProtoReadSupport class is used to help with that.

The signature for ProtoReadSupport is as follows, where only Mesage type is 
declared as the type parameter.
{code:title=ProtoReadSupport.java|borderStyle=solid}
public class ProtoReadSupport<T extends Message> extends ReadSupport<T> { }
{code}
But the RecordMaterializer inside ProtoReadSupport is actually returning 
Message.Builder. This is where type inconsistency comes from.

To make it work, I need to add some hacks for RecordMaterializer 
(ProtoRecordConverter), by switch the 'buildBefore' flag.
{code:title=ProtoReadSupport.java|borderStyle=solid}
    @Override
    public RecordMaterializer<T> prepareForRead(Configuration configuration, 
Map<String, String> keyValueMetaData, MessageType fileSchema, ReadContext 
readContext) {
        String headerProtoClass = keyValueMetaData.get(PB_CLASS);
        String configuredProtoClass = configuration.get(PB_CLASS);

        if (configuredProtoClass != null) {
            LOG.debug("Replacing class " + headerProtoClass + " by " + 
configuredProtoClass);
            headerProtoClass = configuredProtoClass;
        }

        if (headerProtoClass == null) {
            throw new RuntimeException("I Need parameter " + PB_CLASS + " with 
Protocol Buffer class");
        }

        LOG.debug("Reading data with Protocol Buffer class " + 
headerProtoClass);

        MessageType requestedSchema = readContext.getRequestedSchema();
        Class<? extends Message> protobufClass = 
Protobufs.getProtobufClass(headerProtoClass);
        ProtoRecordMaterializer<T> materializer = new 
ProtoRecordMaterializer<T>(requestedSchema, protobufClass);
        
((ProtoRecordConverter<T>)materializer.getRootConverter()).setBuildBefore(true);
        return materializer;
    }
{code}

> ProtoRecordConverter might wrongly cast a Message.Builder to Message
> --------------------------------------------------------------------
>
>                 Key: PARQUET-388
>                 URL: https://issues.apache.org/jira/browse/PARQUET-388
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Wu Xiang
>            Assignee: Reuben Kuhnert
>
> ProtoRecordConverter returns current record as follows:
> {code}
>   public T getCurrentRecord() {
>     if (buildBefore) {
>       return (T) this.reusedBuilder.build();
>     } else {
>       return (T) this.reusedBuilder;
>     }
>   }
> {code}
> However this might fail if T is subclass of Message and buildBefore == false, 
> since it's actually casting a Message.Builder instance to Message type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to