kazdy commented on issue #3724: URL: https://github.com/apache/hudi/issues/3724#issuecomment-954619903
Thanks for the answer :) So there's no such feature that allows for setting something like starting offset when using streaming hudi source but checkpointing is available. As far as I understand the code, streaming source does not reuse configs provided by "hoodie.datasource" so these are just ignored. What I'm after is doing same thing as in Flink using read.streaming.start-commit (https://hudi.apache.org/docs/querying_data/#flink-sql). Do you think this is something that could be added to spark streaming hudi source in the future? Regarding the DeltaStreamer, I think that it might be the right solution. I'm just wondering if I can add my own jar with transform class that does the transformation? In the docs it mentions that it's pluggable but I haven't found anything on that. If my thinking is correct, where the jar goes (i want to run in on emr)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
