walterddr commented on code in PR #10045:
URL: https://github.com/apache/pinot/pull/10045#discussion_r1063598717
##########
pinot-connectors/pinot-flink-connector/src/main/java/org/apache/pinot/connector/flink/sink/PinotSinkFunction.java:
##########
@@ -151,4 +148,18 @@ private void flush()
LOG.info("Pinot segment uploaded to {}", segmentURI);
});
}
+
+ @Override
+ public List<GenericRow> snapshotState(long checkpointId, long timestamp) {
Review Comment:
+1. I am not sure i fully follow the discussion here so let me try to
clarify a background.
I think
- a general idea of the Pinot-Flink sink is designed to backfill data
directly into Pinot;
- checkpointing serves the purpose of swiftly resuming a failed Flink job
without having to restart from fresh.
- restart from fresh is what most users currently do: they create multiple
Flink jobs one responding for a specific time range; if job fails then user
manually cleared up all the segments associated with that time range and
restart the backfill again.
- this PR attempts to enable checkpointing so that restarting the backfill
job entirely is not necessary
- with this in mind in-memory vs. rocksdb state backend definition comes
into play in terms of OOM pressure, but I think that should be a judgment call
on the user who operates the Flink job.
but regardless, i think this PR serves the purpose well and thus i also
agree we can follow up in separate issue/PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]