lukecwik commented on code in PR #24713:
URL: https://github.com/apache/beam/pull/24713#discussion_r1080655146
##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant
ensureTimestampWithinBounds(Instant timestamp) {
public ProcessContinuation process(
@Element Map<String, String> element,
RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
- OutputReceiver<T> receiver)
- throws Exception {
- Map<String, String> configuration = new HashMap<>(element);
-
+ OutputReceiver<T> receiver) {
// Adding the current restriction to the class object to be found by the
database history
register(tracker);
+ Map<String, String> configuration = new HashMap<>(element);
+ String cacheKey =
+ configuration.entrySet().stream()
Review Comment:
nit: note that if this is a hot path then `.stream()` is 2x slower then a
for loop in many use cases.
##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant
ensureTimestampWithinBounds(Instant timestamp) {
public ProcessContinuation process(
@Element Map<String, String> element,
RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
- OutputReceiver<T> receiver)
- throws Exception {
- Map<String, String> configuration = new HashMap<>(element);
-
+ OutputReceiver<T> receiver) {
// Adding the current restriction to the class object to be found by the
database history
register(tracker);
+ Map<String, String> configuration = new HashMap<>(element);
+ String cacheKey =
+ configuration.entrySet().stream()
+ .map(entry -> String.format("(%s,%s)", entry.getKey(),
entry.getValue()))
+ .sorted()
+ .collect(Collectors.joining(","));
configuration.put(BEAM_INSTANCE_PROPERTY, this.getHashCode());
+ SourceTask task;
- SourceConnector connector =
connectorClass.getDeclaredConstructor().newInstance();
- connector.start(configuration);
-
- SourceTask task = (SourceTask)
connector.taskClass().getDeclaredConstructor().newInstance();
-
- try {
+ if (CONNECTOR_CACHE.getIfPresent(cacheKey) == null) {
Review Comment:
Consider using
https://guava.dev/releases/26.0-jre/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-
and having the callable be responsible for creating a new connector
##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant
ensureTimestampWithinBounds(Instant timestamp) {
public ProcessContinuation process(
@Element Map<String, String> element,
RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
- OutputReceiver<T> receiver)
- throws Exception {
- Map<String, String> configuration = new HashMap<>(element);
-
+ OutputReceiver<T> receiver) {
// Adding the current restriction to the class object to be found by the
database history
register(tracker);
+ Map<String, String> configuration = new HashMap<>(element);
+ String cacheKey =
+ configuration.entrySet().stream()
+ .map(entry -> String.format("(%s,%s)", entry.getKey(),
entry.getValue()))
Review Comment:
Why convert this to a string, can you use `element` as the key?
Note that java map equality is defined as
```
Compares the specified object with this map for equality. Returns true if
the given object is also a map and the two maps represent the same mappings.
More formally, two maps m1 and m2 represent the same mappings if
m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works
properly across different implementations of the Map interface.
```
Note that elements are immutable and are typically safe to be store beyond
the lifetime of a bundle.
##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant
ensureTimestampWithinBounds(Instant timestamp) {
public ProcessContinuation process(
@Element Map<String, String> element,
RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
- OutputReceiver<T> receiver)
- throws Exception {
- Map<String, String> configuration = new HashMap<>(element);
-
+ OutputReceiver<T> receiver) {
// Adding the current restriction to the class object to be found by the
database history
register(tracker);
+ Map<String, String> configuration = new HashMap<>(element);
Review Comment:
If the common case is that the configuration is only needed for creating a
new instance move the new hashmap call to when we actually need to create a
copy and instead rely on generating the cache key from `element`
##########
sdks/java/io/debezium/src/test/java/org/apache/beam/io/debezium/KafkaSourceConsumerFnTest.java:
##########
@@ -103,6 +103,7 @@ public void testStoppableKafkaSourceConsumerFn() {
.setCoder(VarIntCoder.of());
pipeline.run().waitUntilFinish();
+ // Since we're now caching calls
Review Comment:
Checking for how many times this is called will be flaky since it will all
depend on how hot the cache remains, is checking how many times this is called
important or should we drop this?
##########
sdks/java/io/debezium/src/main/java/org/apache/beam/io/debezium/KafkaSourceConsumerFn.java:
##########
@@ -217,26 +239,44 @@ private static Instant
ensureTimestampWithinBounds(Instant timestamp) {
public ProcessContinuation process(
@Element Map<String, String> element,
RestrictionTracker<OffsetHolder, Map<String, Object>> tracker,
- OutputReceiver<T> receiver)
- throws Exception {
- Map<String, String> configuration = new HashMap<>(element);
-
+ OutputReceiver<T> receiver) {
// Adding the current restriction to the class object to be found by the
database history
register(tracker);
+ Map<String, String> configuration = new HashMap<>(element);
+ String cacheKey =
+ configuration.entrySet().stream()
+ .map(entry -> String.format("(%s,%s)", entry.getKey(),
entry.getValue()))
+ .sorted()
+ .collect(Collectors.joining(","));
configuration.put(BEAM_INSTANCE_PROPERTY, this.getHashCode());
+ SourceTask task;
- SourceConnector connector =
connectorClass.getDeclaredConstructor().newInstance();
- connector.start(configuration);
-
- SourceTask task = (SourceTask)
connector.taskClass().getDeclaredConstructor().newInstance();
-
- try {
+ if (CONNECTOR_CACHE.getIfPresent(cacheKey) == null) {
+ SourceConnector connector = null;
+ try {
+ connector = connectorClass.getDeclaredConstructor().newInstance();
+ connector.start(configuration);
+ task = (SourceTask)
connector.taskClass().getDeclaredConstructor().newInstance();
+ } catch (InstantiationException
+ | IllegalAccessException
+ | InvocationTargetException
+ | NoSuchMethodException e) {
+ throw new RuntimeException("Unable to initialize connector instance
for Debezium", e);
+ }
Map<String, ?> consumerOffset = tracker.currentRestriction().offset;
- LOG.debug("--------- Consumer offset from Debezium Tracker: {}",
consumerOffset);
+ LOG.debug("--------- Created new Debezium task with offset: {}",
consumerOffset);
task.initialize(new
BeamSourceTaskContext(tracker.currentRestriction().offset));
task.start(connector.taskConfigs(1).get(0));
+ CONNECTOR_CACHE.put(cacheKey, task);
+ } else {
+ task = CONNECTOR_CACHE.getIfPresent(cacheKey);
Review Comment:
this may have been evicted from the cache since you first checked which is
why using get with a callable is needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]