luongngochoa commented on issue #9132: URL: https://github.com/apache/hudi/issues/9132#issuecomment-1630043791
@danny0405 I tried to use with this configuration but it's still not work. (follow this issues [ ](https://github.com/apache/hudi/pull/7727)) `hoodie.deltastreamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.JsonToAvroSchemaConverter` then I test another config then it worked, that: I change source class to AvroKafkaSource then I defined its schema in avro, I produced message to the topic using AvroSerializer in reference to that avro schema. But it's still have one problem that if I use JsonKafkaSource as source class then define its schema with avro schema type 'records'. It's then can parse with schema but have some problem with deserializing with the decoding because the data contain some utf-8 character. even when I added this additional config. it's still get some error `hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer or KafkaSchemaAvroDeserializer` this is the logs ``` 23/07/06 12:53:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 23/07/06 12:53:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (10.233.125.104, executor 1, partition 0, PROCESS_LOCAL, 4391 bytes) taskResourceAssignments Map() 23/07/06 12:53:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.233.125.104:45235 (size: 4.3 KiB, free: 110.0 MiB) 23/07/06 12:53:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (10.233.125.104 executor 1): org.apache.hudi.exception.HoodieIOException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens at [Source: (String)"\00\00\00{"type": 1, "name": "\u00d4 T\u00d4 TR\u01af", "address": "L\u00f4", "brand": "PEUGEOT", "filename": "_E0102376505_"[truncated 11 chars]; line: 1, column: 2] at org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:96) at org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:87) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at scala.collection.Iterator$SliceIterator.next(Iterator.scala:273) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at scala.collection.AbstractIterator.to(Iterator.scala:1431) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1449) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens at [Source: (String)"\00\00\00{"type": 1, line: 1, column: 2] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:710) at com.fasterxml.jackson.core.base.ParserMinimalBase._throwInvalidSpace(ParserMinimalBase.java:688) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2408) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:677) at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4684) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4586) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516) at org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:93) ... 30 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
