+1 DeltaStreamer can be much nicer in such cases.. Any interest in opening a JIRA/PR for this?
On Mon, Sep 16, 2019 at 2:02 AM [email protected] <[email protected]> wrote: > Yes, It makes sense to add validations with descriptive messages. Please > open a ticket and send a PR for this. > Thanks,Balaji.V On Monday, September 16, 2019, 01:11:12 AM PDT, > Pratyaksh Sharma <[email protected]> wrote: > > Hi Balaji, > > I get your point. However I feel in such cases, instead of throwing a Null > Pointer, we should handle the case gracefully. The exception should be > thrown with proper user-facing message. Please let me know your thoughts > on this. > > On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan > <[email protected]> wrote: > > > Hi Pratyaksh, > > This is expected. You need to pass a schema-provider since you are using > > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from > Row > > type information available from Spark Dataset. > > Balaji.V > > On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < > > [email protected]> wrote: > > > > Hi, > > > > I am trying to build a CDC pipeline using Hudi working on tag > hoodie-0.4.7. > > Here is the command I used for running DeltaStreamer - > > > > spark-submit --files jaas.conf --conf > > > 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > > --conf > > > > > 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > > --master yarn --deploy-mode cluster --num-executors 2 --class > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE > > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource > > --source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table > > --target-table cow_table --props > hdfs://path/to/fg-kafka-source.properties > > --transformer-class > com.uber.hoodie.utilities.transform.DebeziumTransformer > > --spark-master yarn-cluster --source-limit 5000 > > > > Basically I have not passed any SchemaProvider class in the command. > When I > > run the above command, I get the below exception in SourceFormatAdapter > and > > the job gets killed - > > > > java.lang.NullPointerException > > at > > > > > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) > > at > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) > > at > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) > > > > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider > > before registering Avro Schemas if the schemaProvider variable is null. > > Hence I am trying to understand if the above exception is expected > > behaviour. > > > > Please help. > > >
