[jira] [Commented] (BEAM-12736) Protobuf schema provider row functions break on camel-case field names

Chris Hinds (Jira) Mon, 23 Aug 2021 03:59:07 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403123#comment-17403123
 ]


Chris Hinds commented on BEAM-12736:
------------------------------------

fwiw, if anyone needs a quick fix while the above PR is considered, I used the 
following class to transform names at the Beam schema-level: 
{code:java}
import com.google.common.base.CaseFormat;
import org.apache.beam.sdk.extensions.protobuf.ProtoMessageSchema;
import org.apache.beam.sdk.schemas.Schema;
import org.apache.beam.sdk.schemas.logicaltypes.OneOfType;
import org.apache.beam.sdk.values.TypeDescriptor;
import org.checkerframework.checker.nullness.qual.Nullable;

import java.util.ArrayList;
import java.util.List;


public class CamelProtoMessageSchema extends ProtoMessageSchema {

    @Override
    public <T> @Nullable Schema schemaFor(TypeDescriptor<T> typeDescriptor) {
        Schema schema = super.schemaFor(typeDescriptor);
        return transformSchema(schema.getFields());
    }


    private Schema transformSchema(List<Schema.Field> fields) {
        ArrayList<Schema.Field> newFields = new ArrayList<>();
        for (Schema.Field f: fields) {
            newFields.add(
                    
f.toBuilder().setName(transformName(f.getName())).setType(transformType(f.getType())).build()
            );
        }
        return new Schema(newFields);
    }


    private String transformName(String name) {
        return CaseFormat.LOWER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, name);
    }


    private Schema.FieldType transformType(Schema.FieldType t) {
        // is the field type a row, if so it has a schema which we should 
transform
        Schema rowSchema = t.getRowSchema();
        if (rowSchema != null) {
            return Schema.FieldType.row(transformSchema(rowSchema.getFields()));
        }
        // is the field type an array, if so it has an element type that needs 
transforming
        Schema.FieldType arrayFieldType =  t.getCollectionElementType();
        if (arrayFieldType != null) {
            return Schema.FieldType.array(transformType(arrayFieldType));
        }
        // is the field type a one-of then get it's schema, transform it, and 
make a new one-of
        Schema.LogicalType logicalType = t.getLogicalType();
        if (logicalType instanceof OneOfType) {
            OneOfType oot = (OneOfType) logicalType;
            OneOfType newOot = 
OneOfType.create(transformSchema(oot.getOneOfSchema().getFields()).getFields());
            return Schema.FieldType.logicalType(newOot);
        }
        // otherwise if none of the above, no type transform is needed
        return t;
    }
}
{code}
then apply using:
{code:java}
SerializableFunction myRowFunction = new 
CamelProtoMessageSchema().toRowFunction(new 
TypeDescriptor<MyDataModel.ProtoPayload>() {}); MyDataModel.ProtoPayload 
payload = … Row row = (Row) myRowFunction.apply(payload);
{code}

> Protobuf schema provider row functions break on camel-case field names
> ----------------------------------------------------------------------
>
>                 Key: BEAM-12736
>                 URL: https://issues.apache.org/jira/browse/BEAM-12736
>             Project: Beam
>          Issue Type: Bug
>          Components: extensions-java-protobuf
>    Affects Versions: 2.31.0
>            Reporter: Chris Hinds
>            Assignee: Chris Hinds
>            Priority: P2
>   Original Estimate: 24h
>          Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> ProtoByteBuddyUtils.protoGetterName() _depends_ on field names being 
> snake-case. But the Protobuf style guide only _recommends_ that field names 
> are so defined.  
> Snake-case is not enforced by protoc and my team have always created proto 
> field names in camel-case (perhaps we didn't understand that protoc would 
> automatically rewrite field names for us). It is likely that we are not alone.
> If one calls a row function against a proto instance whose field were defined 
> in camel-case, an IllegalArgumentException results from the 
> ProtoByteBuddyUtils snake-case assumption.
> {code:java}
> SerializableFunction myRowFunction = new 
> ProtoMessageSchema().toRowFunction(new 
> TypeDescriptor<MyDataModel.ProtoPayload>() {});
> MyDataModel.ProtoPayload payload = …
> Row row = (Row) myRowFunction.apply(payload);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-12736) Protobuf schema provider row functions break on camel-case field names

Reply via email to