bryanburke commented on issue #3641:
URL: https://github.com/apache/hudi/issues/3641#issuecomment-929262475


   @xushiyan Thank you for your response! I had no idea Hudi provides 
event-driven features, so your suggestions are helping me learn quite a bit 
more about the framework. While I do not believe we have a use case currently 
for `SourceCommitCallback` and `S3EventsSource` (see below), we may in the 
future if we start processing streaming data or require a long-running Spark 
cluster.
   
   > Please check out org.apache.hudi.utilities.callback.SourceCommitCallback 
and its implementing classes. This would allow you to trigger downstream jobs 
or logic to run.
   
   Reading over my original post above, I believe I did a somewhat poor job 
defining our exact use case. I can provide some more details for context:
   
   - We process data in batches on a schedule.
   - We do not have a long-running Spark cluster.
   - ETL jobs run on transient Spark clusters (e.g., Amazon EMR/AWS Glue).
   - By the time a downstream job runs, the cluster that ran the prerequisite 
job does not necessarily exist anymore.
   
   Given the above, I do not believe `SourceCommitCallback` meets our use case, 
as the downstream job that reads the Hudi table in S3 runs on a separate 
schedule and (most likely) a completely different transient Spark cluster. 
However, please feel free to provide additional insight if I am missing 
something.
   
   > what Hudi APIs you referring to? you can do all the actions through PySpark
   
   Regarding my other question about exposing Hudi APIs via Python, I am 
referring to `HoodieTableMetaClient` itself and associated timeline classes 
(for example, as requested in 
[HUDI-1998](https://issues.apache.org/jira/browse/HUDI-1998)). However, I 
suppose the question could also extend to the configuration classes like 
`HoodieWriteConfig` and `DataSourceWriteOptions` that the 0.9.0 release notes 
indicate are now preferable over string variables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to