lukecwik commented on code in PR #24713:
URL: https://github.com/apache/beam/pull/24713#discussion_r1080655146


##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant 
ensureTimestampWithinBounds(Instant timestamp) {
   public ProcessContinuation process(
       @Element Map<String, String> element,
       RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
-      OutputReceiver<T> receiver)
-      throws Exception {
-    Map<String, String> configuration = new HashMap<>(element);
-
+      OutputReceiver<T> receiver) {
     // Adding the current restriction to the class object to be found by the 
database history
     register(tracker);
+    Map<String, String> configuration = new HashMap<>(element);
+    String cacheKey =
+        configuration.entrySet().stream()

Review Comment:
   nit: note that if this is a hot path then `.stream()` is 2x slower then a 
for loop in many use cases.



##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant 
ensureTimestampWithinBounds(Instant timestamp) {
   public ProcessContinuation process(
       @Element Map<String, String> element,
       RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
-      OutputReceiver<T> receiver)
-      throws Exception {
-    Map<String, String> configuration = new HashMap<>(element);
-
+      OutputReceiver<T> receiver) {
     // Adding the current restriction to the class object to be found by the 
database history
     register(tracker);
+    Map<String, String> configuration = new HashMap<>(element);
+    String cacheKey =
+        configuration.entrySet().stream()
+            .map(entry -> String.format("(%s,%s)", entry.getKey(), 
entry.getValue()))
+            .sorted()
+            .collect(Collectors.joining(","));
     configuration.put(BEAM_INSTANCE_PROPERTY, this.getHashCode());
+    SourceTask task;
 
-    SourceConnector connector = 
connectorClass.getDeclaredConstructor().newInstance();
-    connector.start(configuration);
-
-    SourceTask task = (SourceTask) 
connector.taskClass().getDeclaredConstructor().newInstance();
-
-    try {
+    if (CONNECTOR_CACHE.getIfPresent(cacheKey) == null) {

Review Comment:
   Consider using 
https://guava.dev/releases/26.0-jre/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-
 and having the callable be responsible for creating a new connector



##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant 
ensureTimestampWithinBounds(Instant timestamp) {
   public ProcessContinuation process(
       @Element Map<String, String> element,
       RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
-      OutputReceiver<T> receiver)
-      throws Exception {
-    Map<String, String> configuration = new HashMap<>(element);
-
+      OutputReceiver<T> receiver) {
     // Adding the current restriction to the class object to be found by the 
database history
     register(tracker);
+    Map<String, String> configuration = new HashMap<>(element);
+    String cacheKey =
+        configuration.entrySet().stream()
+            .map(entry -> String.format("(%s,%s)", entry.getKey(), 
entry.getValue()))

Review Comment:
   Why convert this to a string, can you use `element` as the key?
   
   Note that java map equality is defined as
   ```
   Compares the specified object with this map for equality. Returns true if 
the given object is also a map and the two maps represent the same mappings. 
More formally, two maps m1 and m2 represent the same mappings if 
m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works 
properly across different implementations of the Map interface.
   ```
   
   Note that elements are immutable and are typically safe to be store beyond 
the lifetime of a bundle.



##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant 
ensureTimestampWithinBounds(Instant timestamp) {
   public ProcessContinuation process(
       @Element Map<String, String> element,
       RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
-      OutputReceiver<T> receiver)
-      throws Exception {
-    Map<String, String> configuration = new HashMap<>(element);
-
+      OutputReceiver<T> receiver) {
     // Adding the current restriction to the class object to be found by the 
database history
     register(tracker);
+    Map<String, String> configuration = new HashMap<>(element);

Review Comment:
   If the common case is that the configuration is only needed for creating a 
new instance move the new hashmap call to when we actually need to create a 
copy and instead rely on generating the cache key from `element`



##########
sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/KafkaSourceConsumerFnTest.java:
##########
@@ -103,6 +103,7 @@ public void testStoppableKafkaSourceConsumerFn() {
         .setCoder(VarIntCoder.of());
 
     pipeline.run().waitUntilFinish();
+    // Since we're now caching calls

Review Comment:
   Checking for how many times this is called will be flaky since it will all 
depend on how hot the cache remains, is checking how many times this is called 
important or should we drop this?



##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant 
ensureTimestampWithinBounds(Instant timestamp) {
   public ProcessContinuation process(
       @Element Map<String, String> element,
       RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
-      OutputReceiver<T> receiver)
-      throws Exception {
-    Map<String, String> configuration = new HashMap<>(element);
-
+      OutputReceiver<T> receiver) {
     // Adding the current restriction to the class object to be found by the 
database history
     register(tracker);
+    Map<String, String> configuration = new HashMap<>(element);
+    String cacheKey =
+        configuration.entrySet().stream()
+            .map(entry -> String.format("(%s,%s)", entry.getKey(), 
entry.getValue()))
+            .sorted()
+            .collect(Collectors.joining(","));
     configuration.put(BEAM_INSTANCE_PROPERTY, this.getHashCode());
+    SourceTask task;
 
-    SourceConnector connector = 
connectorClass.getDeclaredConstructor().newInstance();
-    connector.start(configuration);
-
-    SourceTask task = (SourceTask) 
connector.taskClass().getDeclaredConstructor().newInstance();
-
-    try {
+    if (CONNECTOR_CACHE.getIfPresent(cacheKey) == null) {
+      SourceConnector connector = null;
+      try {
+        connector = connectorClass.getDeclaredConstructor().newInstance();
+        connector.start(configuration);
+        task = (SourceTask) 
connector.taskClass().getDeclaredConstructor().newInstance();
+      } catch (InstantiationException
+          | IllegalAccessException
+          | InvocationTargetException
+          | NoSuchMethodException e) {
+        throw new RuntimeException("Unable to initialize connector instance 
for Debezium", e);
+      }
       Map<String, ?> consumerOffset = tracker.currentRestriction().offset;
-      LOG.debug("--------- Consumer offset from Debezium Tracker: {}", 
consumerOffset);
+      LOG.debug("--------- Created new Debezium task with offset: {}", 
consumerOffset);
 
       task.initialize(new 
BeamSourceTaskContext(tracker.currentRestriction().offset));
       task.start(connector.taskConfigs(1).get(0));
+      CONNECTOR_CACHE.put(cacheKey, task);
+    } else {
+      task = CONNECTOR_CACHE.getIfPresent(cacheKey);

Review Comment:
   this may have been evicted from the cache since you first checked which is 
why using get with a callable is needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to