luongngochoa commented on issue #9132:
URL: https://github.com/apache/hudi/issues/9132#issuecomment-1630043791

   @danny0405 I tried to use with this configuration but it's still not work. 
(follow this issues  [
   ](https://github.com/apache/hudi/pull/7727))
   
`hoodie.deltastreamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.JsonToAvroSchemaConverter`
   then I test another config then it worked, that:
   I change source class to AvroKafkaSource then I defined its schema in avro, 
I produced message to the topic using AvroSerializer in reference to that avro 
schema.
   But it's still have one problem that if I use JsonKafkaSource as source 
class then define its schema with avro schema type 'records'. It's then can 
parse with schema but have some problem with deserializing with the decoding 
because the data contain some utf-8 character. 
   even when I added this additional config. it's still get some error
   
`hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer
 or KafkaSchemaAvroDeserializer`
   this is the logs
   ```
   23/07/06 12:53:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
resource profile 0
   23/07/06 12:53:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
0) (10.233.125.104, executor 1, partition 0, PROCESS_LOCAL, 4391 bytes) 
taskResourceAssignments Map()
   23/07/06 12:53:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 10.233.125.104:45235 (size: 4.3 KiB, free: 110.0 MiB)
   23/07/06 12:53:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(10.233.125.104 executor 1): org.apache.hudi.exception.HoodieIOException: 
Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) 
is allowed between tokens
    at [Source: (String)"\00\00\00{"type": 1, "name": "\u00d4 T\u00d4 
TR\u01af", "address": "L\u00f4", "brand": "PEUGEOT", "filename": 
"_E0102376505_"[truncated 11 chars]; line: 1, column: 2]
        at 
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:96)
        at 
org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:87)
        at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at scala.collection.Iterator$SliceIterator.next(Iterator.scala:273)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
        at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
        at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
        at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
        at scala.collection.AbstractIterator.to(Iterator.scala:1431)
        at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
        at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
        at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
        at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
        at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1449)
        at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal character 
((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between 
tokens
    at [Source: (String)"\00\00\00{"type": 1, line: 1, column: 2]
        at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337)
        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:710)
        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._throwInvalidSpace(ParserMinimalBase.java:688)
        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2408)
        at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:677)
        at 
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4684)
        at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4586)
        at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
        at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516)
        at 
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:93)
        ... 30 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to