hudi-bot opened a new issue, #15942:
URL: https://github.com/apache/hudi/issues/15942

   With a HoodieDeltaStreamer configured to decode json records, a non-json 
record causes a failure with no way to skip it or move past. ClickHouse 
KafkaEngine solves this with an optional "skip invalid records" mode, which 
seems like it would be helpful here as well.
   
   It's not clear to me if `--commit-on-errors` is supposed to solve for this, 
but it appears not to in practice.
   
   On failure, the following stack trace is thrown and the process exits:
   
   {{23/07/03 14:49:23 INFO DAGScheduler: ShuffleMapStage 2 (mapToPair at 
HoodieJavaRDD.java:135) failed in 2144.592 s due to Job aborted due to stage 
failure: Task 14 in stage 2.0 failed 1 times, most recent f}}
   {{ailure: Lost task 14.0 in stage 2.0 (TID 16) (executor driver): 
org.apache.hudi.exception.HoodieIOException: Unrecognized token 'testvalue': 
was expecting (JSON String, Number, Array, Obje}}
   {{ct or token 'null', 'true' or 'false')                                     
                                                                                
                                                  }}
   {{ at [Source: (String)"testvalue"; line: 1, column: 10]                     
                                                                                
                                                  }}
   {{        at 
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:96)
                                                                                
                                 }}
   {{        at 
org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:87)
                                                                                
                           }}
   {{        at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
                                                                                
                         }}
   {{        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)     
                                                                                
                                                  }}
   {{        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)     
                                                                                
                                                  }}
   {{        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)     
                                                                                
                                                  }}
   {{        at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
                                                                                
                               }}
   {{        at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
                                                                                
                                 }}
   {{        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)}}
   {{        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)}}
   {{        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)}}
   {{        at org.apache.spark.scheduler.Task.run(Task.scala:136)}}
   {{        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)}}
   {{        at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)}}
   {{        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)}}
   {{        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}}
   {{        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}}
   {{        at java.base/java.lang.Thread.run(Thread.java:829)}}
   {{Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized 
token 'testvalue': was expecting (JSON String, Number, Array, Object or token 
'null', 'true' or 'false')}}
   {{ at [Source: (String)"testvalue"; line: 1, column: 10]}}
   {{        at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)}}
   {{        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:745)}}
   {{        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2961)}}
   {{        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2939)}}
   {{        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._matchToken(ReaderBasedJsonParser.java:2713)}}
   {{        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._matchTrue(ReaderBasedJsonParser.java:2667)}}
   {{        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:767)}}
   {{        at 
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)}}
   {{        at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)}}
   {{        at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)}}
   {{        at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597)}}
   {{        at 
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:93)}}
   {{        ... 17 more}}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6485
   - Type: Bug
   - Affects version(s):
     - 0.12.3
     - 0.13.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to