boyuanzz commented on a change in pull request #13592:
URL: https://github.com/apache/beam/pull/13592#discussion_r547445369



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java
##########
@@ -439,11 +444,37 @@ public IsBounded isBounded() {
     private static final Logger LOG = 
LoggerFactory.getLogger(UnboundedSourceAsSDFWrapperFn.class);
     private static final int DEFAULT_BUNDLE_FINALIZATION_LIMIT_MINS = 10;
     private final Coder<CheckpointT> checkpointCoder;
+    private Cache<UnboundedSourceRestriction, UnboundedReader> cachedReaders;
 
     private UnboundedSourceAsSDFWrapperFn(Coder<CheckpointT> checkpointCoder) {
       this.checkpointCoder = checkpointCoder;
     }
 
+    private UnboundedSourceRestriction createCacheKey(
+        UnboundedSource<OutputT, CheckpointT> source, CheckpointT checkpoint) {
+      // For caching reader, we don't care about the watermark.
+      return UnboundedSourceRestriction.create(
+          source, checkpoint, BoundedWindow.TIMESTAMP_MIN_VALUE);
+    }
+
+    @Setup
+    public void setUp() throws Exception {
+      cachedReaders =
+          CacheBuilder.newBuilder()
+              .expireAfterWrite(5, TimeUnit.MINUTES)

Review comment:
       5 mins might be high for DirectRunner but I feel like a few seconds 
might be too small for a distributed system like Dataflow, especially for a 
long run streaming application. How about we start from 1min?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to