garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint 
from previous commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-594767090
 
 
   Yeah, I definitely agree that there are some work to do to improve the 
migration process to the delta streamer. In order to use 
`deltastreamer.checkpoint.reset_key` I will need something like a 
`checkpointGenerator` mentioned above, otherwise it would be difficult to find 
the correct checkpoint for each table. I have a few hundreds of tables to 
manage so I do need a robust and trustworthy solution for the migration.
   Also, I think it makes sense to give more options to the users to play 
around with the delta streamer for their own use cases.  
   e.g. 
   - Allow the user to get checkpoint from commits older than the last 
commit(This PR)
   - Allow the user to get checkpoint from a specific commit
   - Allow the user to store checkpoint info in the commit metadata even if 
they are not using delta streamer. For example, when they are using HDFS 
importer or Spark Datasource writer to do the initial bulk_insert.
   - Maybe more ...
   
   With though flexibility, I believe the user will be able to use the delta 
streamer in a more programmatically way. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to