lostluck commented on issue #28959: URL: https://github.com/apache/beam/issues/28959#issuecomment-1767021739
This does seem like a bug in the batch partitioned reads. The failing line SDK side: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/spannerio/read_batch.go#L70 Appears to be sending in a bad BatchReadOnlyTransactionID from the partitionedRead, ------------ And the issue appears to be because those spanner types don't JSON serialize, they BinaryMarshals. So of course the type is bad on encode/decode. Beam doesn't automatically handle binary marshalling, and neither does the final fallback JSON encoding. https://pkg.go.dev/cloud.google.com/go/spanner#BatchReadOnlyTransactionID https://pkg.go.dev/cloud.google.com/go/spanner#Partition The fastest fix is to register encoder and decoder functions for the type. So what probably needs to be done is to register a coder for `spanner.partitionedRead`, since that field is not getting encoded properly. eg. Similar to the following should be added to the spannerio package, but with error handling added in, and pseudo code filled in. ``` func init() { beam.RegisterCoder(reflect.TypeOf(partitionedRead{}), encodePartitionedRead, decodePartitionedRead) } func encodePartitionedRead(v partitionedRead) []byte { a, _:= v.BatchTransactionId.MarshalBinary() b, _ := v.Partition.MarshalBinary() // length prefix the first byte set, and return the bytesparameter } func decodePartitionedRead(bytes []byte) partitionedRead { var ret partitionedRead // Decode the length prefix n so we can split the fields. _ = ret.BatchTransactionId.UnarshalBinary(bytes[:n]) _ = ret.Partition.UnarshalBinary(bytes[n:]) return ret } ``` See the doc for some light documentation https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#RegisterCoder . It still claims json is the default coder, but that's not been true for a bit. The rest of it remains valid however. If one of you two @pequalsnp @azach were to try it and validate it for your usecase(s), that would be a big help versus my simple guess. IOs are hard because one needs to be an expert in both beam and the datasource them to write them well. I'm very happy to review/merge any changes that work for y'all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
