rubenssoto opened a new issue #2810:
URL: https://github.com/apache/hudi/issues/2810
Hello Guys,
How hudi with spark structured streaming works?
I'm trying to use no error, but no files are written.
---Function to read Hudi Table
def read_hudi_stream(spark_session, read_folder_path, max_files_per_trigger):
spark = spark_session
df = (
spark.readStream.format("hudi")
.load(read_folder_path)
)
return df
--Function to write the stream
def write_parquet_stream_trigger_once(
spark_data_frame, checkpoint_location_folder, foreach_batch_function
):
df_write_query = (
spark_data_frame.writeStream.trigger(once=True)
.foreachBatch(foreach_batch_function)
.outputMode("append")
.option("checkpointLocation", checkpoint_location_folder)
.start()
)
df_write_query.awaitTermination()
Im using a foreach batch function, and my hudi write with hudi options are
inside my foreach batch.
It is a new feature, so I dont know how it works.
@pengzhiwei2018
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]