brucearctor commented on PR #29368:
URL: https://github.com/apache/beam/pull/29368#issuecomment-1805146921

   > I believe some other Go PR broke the Python Docker tests. @brucearctor , 
this PR enables Kafka Beam YAML to read and write proto.
   > 
   > Reading is 'easy' because the user has to provide the file descriptor and 
the message name so we can build the Row from the bytes.
   > 
   > Writing was a bit more challenging. I initially had this method to write 
to proto by just providing the Row schema:
   > 
   > ```
   > public static SerializableFunction<Row, byte[]> getRowToProtoBytes() {
   >     return new SimpleFunction<Row, byte[]>() {
   >       @Override
   >       public byte[] apply(Row input) {
   >         SchemaApi.Row rowProto = SchemaTranslation.rowToProto(input);
   >         return rowProto.toByteArray();
   >       }
   >     };
   >   }
   > ```
   > 
   > However, this generates a SchemaApi.Row proto. This implies that if 
another system intends to read these protos, one must construct the proto using 
the SchemaApi file descriptor, etc. This approach doesn't make much sense, as 
we don't want users to remember this as the output.
   > 
   > As a result, I am also requesting the file descriptor and message name for 
the output. This enables other systems to access the output proto without 
needing to remember intricate details. While this description may be dense, I 
believe it makes sense. I tested it with Dataflow (both read and write), and it 
worked fine.
   
   Finally finding a few moments to look again a bit closer.  
   
   (a) As mentioned, the general approach makes sense to me.  
   
   (b) i imagine you are correct about failing tests, though haven't dug into 
that yet.
   
   I don't know the extent that I highlighted to you previously, but Scio had a 
creative way for handling proto bytes ...  Do checkout: 
https://spotify.github.io/scio/io/Protobuf.html and related, just for 
additional context on another way could be handled.  I think there are 
pros/cons.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to