bvolpato commented on code in PR #26286:
URL: https://github.com/apache/beam/pull/26286#discussion_r1170695878


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteReadSchemaTransformProvider.java:
##########
@@ -59,12 +68,52 @@
   public static final Set<String> VALID_DATA_FORMATS =
       Sets.newHashSet(VALID_FORMATS_STR.split(","));
 
+  public static final TupleTag<Row> OUTPUT_TAG = new TupleTag<Row>() {};
+  public static final TupleTag<Row> ERROR_TAG = new TupleTag<Row>() {};
+  public static final Schema ERROR_SCHEMA =
+      
Schema.builder().addStringField("error").addNullableByteArrayField("row").build();
+
   @Override
   protected @UnknownKeyFor @NonNull @Initialized 
Class<PubsubLiteReadSchemaTransformConfiguration>
       configurationClass() {
     return PubsubLiteReadSchemaTransformConfiguration.class;
   }
 
+  public static class ErrorFn extends DoFn<SequencedMessage, Row> {
+    private SerializableFunction<byte[], Row> valueMapper;
+    private Counter errorCounter;
+    private Long errorsInBundle = 0L;
+
+    public ErrorFn(String name, SerializableFunction<byte[], Row> valueMapper) 
{
+      this.errorCounter = 
Metrics.counter(PubsubLiteReadSchemaTransformProvider.class, name);
+      this.valueMapper = valueMapper;
+    }
+
+    @ProcessElement
+    public void process(@DoFn.Element SequencedMessage seqMessage, 
MultiOutputReceiver receiver) {
+      try {
+        receiver
+            .get(OUTPUT_TAG)
+            
.output(valueMapper.apply(seqMessage.getMessage().getData().toByteArray()));
+      } catch (Exception e) {
+        errorsInBundle += 1;
+        System.out.println("Error while parsing the element" + e.toString());

Review Comment:
   I think we should use LOG and add the exception instead of sysout / 
.toString(), something like this:
   ```
     LOG.warn("Error while parsing the element", e);
   
   ```
   
   
    (make sure you add the LOG member first)
   
   ```
     private static final Logger LOG = 
LoggerFactory.getLogger(TokenizationBigTableIO.class);
   
   ```



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteReadSchemaTransformProvider.java:
##########
@@ -59,12 +68,52 @@
   public static final Set<String> VALID_DATA_FORMATS =
       Sets.newHashSet(VALID_FORMATS_STR.split(","));
 
+  public static final TupleTag<Row> OUTPUT_TAG = new TupleTag<Row>() {};
+  public static final TupleTag<Row> ERROR_TAG = new TupleTag<Row>() {};
+  public static final Schema ERROR_SCHEMA =
+      
Schema.builder().addStringField("error").addNullableByteArrayField("row").build();
+
   @Override
   protected @UnknownKeyFor @NonNull @Initialized 
Class<PubsubLiteReadSchemaTransformConfiguration>
       configurationClass() {
     return PubsubLiteReadSchemaTransformConfiguration.class;
   }
 
+  public static class ErrorFn extends DoFn<SequencedMessage, Row> {
+    private SerializableFunction<byte[], Row> valueMapper;
+    private Counter errorCounter;
+    private Long errorsInBundle = 0L;
+
+    public ErrorFn(String name, SerializableFunction<byte[], Row> valueMapper) 
{
+      this.errorCounter = 
Metrics.counter(PubsubLiteReadSchemaTransformProvider.class, name);
+      this.valueMapper = valueMapper;
+    }
+
+    @ProcessElement
+    public void process(@DoFn.Element SequencedMessage seqMessage, 
MultiOutputReceiver receiver) {
+      try {
+        receiver
+            .get(OUTPUT_TAG)
+            
.output(valueMapper.apply(seqMessage.getMessage().getData().toByteArray()));
+      } catch (Exception e) {
+        errorsInBundle += 1;
+        System.out.println("Error while parsing the element" + e.toString());

Review Comment:
   I think we should use LOG and log the exception / stacktraces instead of 
sysout / .toString(), something like this:
   ```
     LOG.warn("Error while parsing the element", e);
   
   ```
   
   
    (make sure you add the LOG member first)
   
   ```
     private static final Logger LOG = 
LoggerFactory.getLogger(TokenizationBigTableIO.class);
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to