[GitHub] [hudi] rubenssoto opened a new issue #2810: [SUPPORT] How Hudi with Spark Streaming works?

GitBox Mon, 12 Apr 2021 08:13:16 -0700


rubenssoto opened a new issue #2810:
URL: https://github.com/apache/hudi/issues/2810



   Hello Guys,
   
   How hudi with spark structured streaming works?
   
   I'm trying to use no error, but no files are written.
   
   ---Function to read Hudi Table
   def read_hudi_stream(spark_session, read_folder_path, max_files_per_trigger):
       spark = spark_session
       df = (
           spark.readStream.format("hudi")
           .load(read_folder_path)
       )
       return df
   
   
   --Function to write the stream
   def write_parquet_stream_trigger_once(
       spark_data_frame, checkpoint_location_folder, foreach_batch_function
   ):
       df_write_query = (
           spark_data_frame.writeStream.trigger(once=True)
           .foreachBatch(foreach_batch_function)
           .outputMode("append")
           .option("checkpointLocation", checkpoint_location_folder)
           .start()
       )
       df_write_query.awaitTermination()
   
   
   Im using a foreach batch function, and my hudi write with hudi options are 
inside my foreach batch.
   
   It is a new feature, so I dont know how it works.
   
   @pengzhiwei2018 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rubenssoto opened a new issue #2810: [SUPPORT] How Hudi with Spark Streaming works?

Reply via email to