garyli1019 commented on a change in pull request #1486: [HUDI-759] Integrate
checkpoint provider with delta streamer
URL: https://github.com/apache/incubator-hudi/pull/1486#discussion_r407793139
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
##########
@@ -90,35 +90,33 @@
public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc) throws
IOException {
this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath,
jssc.hadoopConfiguration()),
- getDefaultHiveConf(jssc.hadoopConfiguration()));
+ jssc.hadoopConfiguration(), null);
}
public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc,
TypedProperties props) throws IOException {
this(cfg, jssc, FSUtils.getFs(cfg.targetBasePath,
jssc.hadoopConfiguration()),
- getDefaultHiveConf(jssc.hadoopConfiguration()), props);
+ jssc.hadoopConfiguration(), props);
}
- public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs,
HiveConf hiveConf,
- TypedProperties properties) throws IOException {
- this.cfg = cfg;
- this.deltaSyncService = new DeltaSyncService(cfg, jssc, fs, hiveConf,
properties);
+ public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs,
Configuration hiveConf) throws IOException {
+ this(cfg, jssc, fs, hiveConf, null);
}
- public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs,
HiveConf hiveConf) throws IOException {
+ public HoodieDeltaStreamer(Config cfg, JavaSparkContext jssc, FileSystem fs,
Configuration hiveConf,
+ TypedProperties properties) throws IOException {
+ if (cfg.initialCheckpointProvider != null && cfg.bootstrapFromPath != null
&& cfg.checkpoint == null) {
+ InitialCheckPointProvider checkPointProvider =
+
UtilHelpers.createInitialCheckpointProvider(cfg.initialCheckpointProvider, new
Path(cfg.bootstrapFromPath), fs);
+ cfg.checkpoint = checkPointProvider.getCheckpoint();
Review comment:
I believe the initial checkpoint provider should be just used once when the
user wants to switch from one source to another. After that, the delta streamer
should be able to get the checkpoint from the previous commit. We can improve
this once the bootstrap is ready. At this point, I am not sure how to put
everything together if we want one step to handling everything.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services