[
https://issues.apache.org/jira/browse/BEAM-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403123#comment-17403123
]
Chris Hinds commented on BEAM-12736:
------------------------------------
fwiw, if anyone needs a quick fix while the above PR is considered, I used the
following class to transform names at the Beam schema-level:
{code:java}
import com.google.common.base.CaseFormat;
import org.apache.beam.sdk.extensions.protobuf.ProtoMessageSchema;
import org.apache.beam.sdk.schemas.Schema;
import org.apache.beam.sdk.schemas.logicaltypes.OneOfType;
import org.apache.beam.sdk.values.TypeDescriptor;
import org.checkerframework.checker.nullness.qual.Nullable;
import java.util.ArrayList;
import java.util.List;
public class CamelProtoMessageSchema extends ProtoMessageSchema {
@Override
public <T> @Nullable Schema schemaFor(TypeDescriptor<T> typeDescriptor) {
Schema schema = super.schemaFor(typeDescriptor);
return transformSchema(schema.getFields());
}
private Schema transformSchema(List<Schema.Field> fields) {
ArrayList<Schema.Field> newFields = new ArrayList<>();
for (Schema.Field f: fields) {
newFields.add(
f.toBuilder().setName(transformName(f.getName())).setType(transformType(f.getType())).build()
);
}
return new Schema(newFields);
}
private String transformName(String name) {
return CaseFormat.LOWER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, name);
}
private Schema.FieldType transformType(Schema.FieldType t) {
// is the field type a row, if so it has a schema which we should
transform
Schema rowSchema = t.getRowSchema();
if (rowSchema != null) {
return Schema.FieldType.row(transformSchema(rowSchema.getFields()));
}
// is the field type an array, if so it has an element type that needs
transforming
Schema.FieldType arrayFieldType = t.getCollectionElementType();
if (arrayFieldType != null) {
return Schema.FieldType.array(transformType(arrayFieldType));
}
// is the field type a one-of then get it's schema, transform it, and
make a new one-of
Schema.LogicalType logicalType = t.getLogicalType();
if (logicalType instanceof OneOfType) {
OneOfType oot = (OneOfType) logicalType;
OneOfType newOot =
OneOfType.create(transformSchema(oot.getOneOfSchema().getFields()).getFields());
return Schema.FieldType.logicalType(newOot);
}
// otherwise if none of the above, no type transform is needed
return t;
}
}
{code}
then apply using:
{code:java}
SerializableFunction myRowFunction = new
CamelProtoMessageSchema().toRowFunction(new
TypeDescriptor<MyDataModel.ProtoPayload>() {}); MyDataModel.ProtoPayload
payload = … Row row = (Row) myRowFunction.apply(payload);
{code}
> Protobuf schema provider row functions break on camel-case field names
> ----------------------------------------------------------------------
>
> Key: BEAM-12736
> URL: https://issues.apache.org/jira/browse/BEAM-12736
> Project: Beam
> Issue Type: Bug
> Components: extensions-java-protobuf
> Affects Versions: 2.31.0
> Reporter: Chris Hinds
> Assignee: Chris Hinds
> Priority: P2
> Original Estimate: 24h
> Time Spent: 10m
> Remaining Estimate: 23h 50m
>
> ProtoByteBuddyUtils.protoGetterName() _depends_ on field names being
> snake-case. But the Protobuf style guide only _recommends_ that field names
> are so defined.
> Snake-case is not enforced by protoc and my team have always created proto
> field names in camel-case (perhaps we didn't understand that protoc would
> automatically rewrite field names for us). It is likely that we are not alone.
> If one calls a row function against a proto instance whose field were defined
> in camel-case, an IllegalArgumentException results from the
> ProtoByteBuddyUtils snake-case assumption.
> {code:java}
> SerializableFunction myRowFunction = new
> ProtoMessageSchema().toRowFunction(new
> TypeDescriptor<MyDataModel.ProtoPayload>() {});
> MyDataModel.ProtoPayload payload = …
> Row row = (Row) myRowFunction.apply(payload);
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)