Hi Pratyaksh,
This is expected. You need to pass a schema-provider since you are using Avro 
Sources.For RowBased sources, DeltaStreamer can deduce schema from Row type 
information available from Spark Dataset.
Balaji.V
    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma 
<pratyaks...@gmail.com> wrote:  
 
 Hi,

I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7.
Here is the command I used for running DeltaStreamer -

spark-submit --files jaas.conf --conf
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--conf
'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--master yarn --deploy-mode cluster --num-executors 2 --class
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
/path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
--source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
--source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
--target-table cow_table --props hdfs://path/to/fg-kafka-source.properties
--transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer
--spark-master yarn-cluster --source-limit 5000

Basically I have not passed any SchemaProvider class in the command. When I
run the above command, I get the below exception in SourceFormatAdapter and
the job gets killed -

java.lang.NullPointerException
at
com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)

In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
before registering Avro Schemas if the schemaProvider variable is null.
Hence I am trying to understand if the above exception is expected
behaviour.

Please help.
  

Reply via email to