garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-594767090 Yeah, I definitely agree that there are some work to do to improve the migration process to the delta streamer. In order to use `deltastreamer.checkpoint.reset_key` I will need something like a `checkpointGenerator` mentioned above, otherwise it would be difficult to find the correct checkpoint for each table. I have a few hundreds of tables to manage so I do need a robust and trustworthy solution for the migration. Also, I think it makes sense to give more options to the users to play around with the delta streamer for their own use cases. e.g. - Allow the user to get checkpoint from commits older than the last commit(This PR) - Allow the user to get checkpoint from a specific commit - Allow the user to store checkpoint info in the commit metadata even if they are not using delta streamer. For example, when they are using HDFS importer or Spark Datasource writer to do the initial bulk_insert. - Maybe more ... With though flexibility, I believe the user will be able to use the delta streamer in a more programmatically way.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
