[
https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986645#comment-14986645
]
Wu Xiang commented on PARQUET-388:
----------------------------------
hi Reuben,
Yes, I agree that current record could be either Message or Message.Builder.
But the problem is when I use it, the 'buildBefore' flag would be always false,
which means the Builder instance is always returned.
More details,
I'm implementing a wrapper for cascading to read/write parquet|protobuf data.
And the default ProtoReadSupport class is used to help with that.
The signature for ProtoReadSupport is as follows, where only Mesage type is
declared as the type parameter.
{code:title=ProtoReadSupport.java|borderStyle=solid}
public class ProtoReadSupport<T extends Message> extends ReadSupport<T> { }
{code}
But the RecordMaterializer inside ProtoReadSupport is actually returning
Message.Builder. This is where type inconsistency comes from.
To make it work, I need to add some hacks for RecordMaterializer
(ProtoRecordConverter), by switch the 'buildBefore' flag.
{code:title=ProtoReadSupport.java|borderStyle=solid}
@Override
public RecordMaterializer<T> prepareForRead(Configuration configuration,
Map<String, String> keyValueMetaData, MessageType fileSchema, ReadContext
readContext) {
String headerProtoClass = keyValueMetaData.get(PB_CLASS);
String configuredProtoClass = configuration.get(PB_CLASS);
if (configuredProtoClass != null) {
LOG.debug("Replacing class " + headerProtoClass + " by " +
configuredProtoClass);
headerProtoClass = configuredProtoClass;
}
if (headerProtoClass == null) {
throw new RuntimeException("I Need parameter " + PB_CLASS + " with
Protocol Buffer class");
}
LOG.debug("Reading data with Protocol Buffer class " +
headerProtoClass);
MessageType requestedSchema = readContext.getRequestedSchema();
Class<? extends Message> protobufClass =
Protobufs.getProtobufClass(headerProtoClass);
ProtoRecordMaterializer<T> materializer = new
ProtoRecordMaterializer<T>(requestedSchema, protobufClass);
((ProtoRecordConverter<T>)materializer.getRootConverter()).setBuildBefore(true);
return materializer;
}
{code}
> ProtoRecordConverter might wrongly cast a Message.Builder to Message
> --------------------------------------------------------------------
>
> Key: PARQUET-388
> URL: https://issues.apache.org/jira/browse/PARQUET-388
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Wu Xiang
> Assignee: Reuben Kuhnert
>
> ProtoRecordConverter returns current record as follows:
> {code}
> public T getCurrentRecord() {
> if (buildBefore) {
> return (T) this.reusedBuilder.build();
> } else {
> return (T) this.reusedBuilder;
> }
> }
> {code}
> However this might fail if T is subclass of Message and buildBefore == false,
> since it's actually casting a Message.Builder instance to Message type.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)