Re: [PR] [YAML] - Kafka write and RAW format [beam]

via GitHub Mon, 30 Oct 2023 14:40:00 -0700


ffernandez92 commented on code in PR #29160:
URL: https://github.com/apache/beam/pull/29160#discussion_r1376830896



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriteSchemaTransformProvider.java:
##########
@@ -131,10 +134,18 @@ public void finish() {
     @Override
     public PCollectionRowTuple expand(PCollectionRowTuple input) {
       Schema inputSchema = input.get("input").getSchema();
-      final SerializableFunction<Row, byte[]> toBytesFn =
-          configuration.getFormat().equals("JSON")
-              ? JsonUtils.getRowToJsonBytesFunction(inputSchema)
-              : AvroUtils.getRowToAvroBytesFunction(inputSchema);
+      final SerializableFunction<Row, byte[]> toBytesFn;
+      if (configuration.getFormat().equals("RAW")) {
+        if (!inputSchema.hasField(PAYLOAD)) {

Review Comment:
   Yes, that's a tricky one. Kafka doesn't have the same attribute mechanism as 
PubSub. Maybe what we can do is allow the user to decide which "field" they 
want to pull from the Row. Then, we could have something like this:
   
   ```
       if len(field_names) != 1:
         raise ValueError(f'Expecting exactly one field, found {field_names}')
   ```
   ^ From the PubSub implementation. Then we do the 
input.getBytes(name_of_the_field_that_want_to_pull). I guess we need to come up 
with a good name for that like, getRowField. What do you think?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [YAML] - Kafka write and RAW format [beam]

Reply via email to