WarFox commented on code in PR #9059:
URL: https://github.com/apache/hudi/pull/9059#discussion_r1245014800


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -639,7 +639,13 @@ private Pair<SchemaProvider, Pair<String, 
JavaRDD<HoodieRecord>>> fetchFromSourc
             BuiltinKeyGenerator builtinKeyGenerator = (BuiltinKeyGenerator) 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(props);
             List<HoodieRecord> avroRecords = new ArrayList<>();
             while (genericRecordIterator.hasNext()) {
-              GenericRecord genRec = genericRecordIterator.next();
+              GenericRecord genRec = null;
+              try {
+                genRec = genericRecordIterator.next();
+              } catch (IllegalArgumentException e) {
+                LOG.warn("Handling exception for transaction topic  -  " + 
e.getMessage());
+                break;

Review Comment:
   @nsivabalan I don't think there is a very reliable kafka configuration that 
we can use to deduct this. It is up to the Kafka Producer to initiate a 
transaction and mark the beginning and ending of a transaction. If Kafka 
Streams is used then there is a 
[configuration](https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/StreamsConfig.html)
 `processing.guarantee` which can be set to `exactly_once` for transactional 
topics. But not sure how Hudi can get this configuration.
   
   Also, note that in a single topic we may have transactional messages and and 
non-transactional messages depending on the application logic.
   
   The team using Hudi Deltastreamer will know if their topic has transactional 
messages. I would say safer bet is to introduce a Hudi configuration for this. 
That gives flexibility for the team to mark their topic as transactional or not.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to