C0urante commented on a change in pull request #11773:
URL: https://github.com/apache/kafka/pull/11773#discussion_r824130593



##########
File path: 
connect/api/src/main/java/org/apache/kafka/connect/source/SourceConnector.java
##########
@@ -28,4 +30,46 @@
     protected SourceConnectorContext context() {
         return (SourceConnectorContext) context;
     }
+
+    /**
+     * Signals whether the connector supports exactly-once delivery guarantees 
with a proposed configuration.
+     * Developers can assume that worker-level exactly-once support is enabled 
when this method is invoked.
+     *
+     * <p>For backwards compatibility, the default implementation will return 
{@code null}, but connector developers are
+     * strongly encouraged to override this method to return a non-null value 
such as
+     * {@link ExactlyOnceSupport#SUPPORTED SUPPORTED} or {@link 
ExactlyOnceSupport#UNSUPPORTED UNSUPPORTED}.
+     *
+     * <p>Similar to {@link #validate(Map) validate}, this method may be 
called by the runtime before the
+     * {@link #start(Map) start} method is invoked when the connector will be 
run with exactly-once support.

Review comment:
       Oooh, that's an interesting idea.
   
   I think your point about the connector doing extra work is my biggest 
concern on that front, and it seems to be validated (heh) by how `validate` is 
designed (we never actually invoke `validate` on the same `Connector` instances 
that we invoke `start` on at the moment). It may not be very frequent, but 
there certainly are some connectors out there that do some heavy lifting in 
`start` that we wouldn't want to do every time during preflight validation. For 
a practical example, see MirrorMaker2: 
https://github.com/apache/kafka/blob/38e3787d760fba6bdac4c91126e87cd7939ae43f/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L120-L128
   
   Adding a `configure` method would help with this issue by allowing us to 
give a long-lived config to a `Connector` that the connector can hold onto 
across successive invocations of `exactlyOnceSupport` and 
`canDefineTransactionBoundaries` (and possibly even `validate`). But, it would 
also complicate some of the existing preflight validation logic in the 
framework. Right now, we only create a single `Connector` instance per 
connector type per worker for preflight validation, and all invocations of 
`validate` and `config` are performed with that instance (see `AbstractHerder` 
[here](https://github.com/apache/kafka/blob/38e3787d760fba6bdac4c91126e87cd7939ae43f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L431),
 
[here](https://github.com/apache/kafka/blob/38e3787d760fba6bdac4c91126e87cd7939ae43f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L664-L666),
 and [here](https://github.com/apache/kafka/blob/38e3787d760
 
fba6bdac4c91126e87cd7939ae43f/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L461-L479)).
   
   Since we use these validation-only `Connector` instances pretty loosely and 
without any guarantees about order of invocations for `config` and `validate`, 
it's unlikely that there are connectors out there that store state in either of 
these methods, so there could be relatively low risk for creating a new 
validation-only `Connector` instance every time a config has to be validated.
   
   But I have to wonder if we're overthinking things? It seems like we're 
trying to optimize away from methods that accept a raw config map and instead 
only provide that config once per round of validation. Is the objective there 
to create a smoother/better-documented/more-strictly-defined API? Improve 
resource utilization by obviating the need of connector developers to construct 
`AbstractConfig` subclasses (or equivalents) which parse and validate 
individual properties at instantiation time?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to