bryanburke commented on issue #3641: URL: https://github.com/apache/hudi/issues/3641#issuecomment-929262475
@xushiyan Thank you for your response! I had no idea Hudi provides event-driven features, so your suggestions are helping me learn quite a bit more about the framework. While I do not believe we have a use case currently for `SourceCommitCallback` and `S3EventsSource` (see below), we may in the future if we start processing streaming data or require a long-running Spark cluster. > Please check out org.apache.hudi.utilities.callback.SourceCommitCallback and its implementing classes. This would allow you to trigger downstream jobs or logic to run. Reading over my original post above, I believe I did a somewhat poor job defining our exact use case. I can provide some more details for context: - We process data in batches on a schedule. - We do not have a long-running Spark cluster. - ETL jobs run on transient Spark clusters (e.g., Amazon EMR/AWS Glue). - By the time a downstream job runs, the cluster that ran the prerequisite job does not necessarily exist anymore. Given the above, I do not believe `SourceCommitCallback` meets our use case, as the downstream job that reads the Hudi table in S3 runs on a separate schedule and (most likely) a completely different transient Spark cluster. However, please feel free to provide additional insight if I am missing something. > what Hudi APIs you referring to? you can do all the actions through PySpark Regarding my other question about exposing Hudi APIs via Python, I am referring to `HoodieTableMetaClient` itself and associated timeline classes (for example, as requested in [HUDI-1998](https://issues.apache.org/jira/browse/HUDI-1998)). However, I suppose the question could also extend to the configuration classes like `HoodieWriteConfig` and `DataSourceWriteOptions` that the 0.9.0 release notes indicate are now preferable over string variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
