Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

via GitHub Thu, 29 Feb 2024 14:39:27 -0800


yifan-c commented on code in PR #44:
URL: 
https://github.com/apache/cassandra-analytics/pull/44#discussion_r1508274918



##########
cassandra-bridge/src/main/java/org/apache/cassandra/spark/utils/TimeProvider.java:
##########
@@ -19,16 +19,37 @@
 
 package org.apache.cassandra.spark.utils;
 
+import java.util.concurrent.TimeUnit;
+
 /**
  * Provides current time
   */
-@FunctionalInterface
 public interface TimeProvider
 {
-    TimeProvider INSTANCE = () -> (int) 
Math.floorDiv(System.currentTimeMillis(), 1000L);
+    TimeProvider DEFAULT = new TimeProvider()
+    {
+        private final int referenceEpoch = nowInSeconds();

Review Comment:
   My thinking is that SBR reads a snapshot. Ideally, whatever is unexpired at 
the moment of taking snapshot, should be read back, making it 
"snapshot-isolated-ish". Using a fixed during the job can archive it, 
especially the job creates a snapshot. 
   What is the "long running spark cluster"? Is it long running because the 
_one_ snapshot has a large amount of data?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

Reply via email to