gharris1727 commented on PR #14064: URL: https://github.com/apache/kafka/pull/14064#issuecomment-1648540605
> I think TSV is fine for the output format. It'd definitely help if we added headers. 👍 > I also found it a bit difficult to parse with the plugin location being listed first That has been bothering me too, so i'm glad you mentioned it. Especially when the plugins are deep in the filesystem tree, the constant part of the path takes up the whole screen. I'll move it. > When run on a worker config with no plugin.path property, there's no output. We should at least log a warning in this case, and if there's no legitimate reason to operate on a worker config without a plugin.path property, we should probably fail instead. Thanks for testing this. I'm conflicted, because we disallow omitting all of the the location, path, and config arguments at the same time in an attempt to avoid empty listings. Since an empty plugin.path is permitted in a valid worker config without a warning, I think we also have to permit it here without a warning. We'll trust the user says what they mean, and if they don't actually mean what they say, they'll learn about that with an empty result. > More generally, if no plugins are detected, the script exits with no output. Perhaps we could summarize how many plugins were found at the end of the script? I have been doing that with `| wc -l`, which becomes a bit messy with the addition of the header. Perhaps following the direct listing with a separator that is easy to parse (an empty line), and then printing a summary (# of plugins, # of classes, # of sink connectors, # that will be changed by sync-manifests, etc) would be appealing. That way, anyone that wants to process the raw data themselves can cut off the summary at the separator. I'll explore this a bit more. > When run on a worker config with plugin.path set to a blank string (i.e., plugin.path=), then the script scans the current directory. Since this is also the behavior for Connect workers, I don't think we can disallow this case, but maybe it's worth printing a warning (in both cases)? Oh no, I didn't realize that we accepted relative paths in the `plugin.path`, but it makes sense given the re-use of the `PluginUtils#pluginLocations`. Since this command will very likely be called from a different working directory than the worker itself, I think adding a warning for this is a good idea. > It seems like we're introducing divergence here between the plugin path CLI and Connect workers in how aliases are computed. Workers compute aliases after doing complete scanning of the plugin path, whereas this script does it for each plugin location. Doesn't that introduce false negatives into the script's collision detection logic? I justified it to myself that it was the "most generous" set of aliases, counting only collisions internal to each individual plugin location. But that isn't even correct with the current implementation, since collisions with the classpath also cause aliases to be hidden! I think consistency with the runtime is more important, so I'll change this to compute aliases over the whole run. It's so sad to lose the incremental output though, since the aliases of the first line printed depend on the classname used in the final plugin found. > Nearly all of the out-of-the-box plugins for Connect (SMTs, client config override policies, predicates, converters, header converters) have undefined versions. Can we implement the Versioned interface in these plugins so that they show up with valid version info? (If so, it's fine to leave that for a follow-up PR). This is an amazing suggestion, and I'll certainly implement that in a follow-up PR. > IIUC this is because the JSON converter implements two different plugin interfaces (Converter and HeaderConverter), but the PluginDesc class is only capable of storing [one plugin type](https://github.com/apache/kafka/blob/84691b11f64e85a7f3f6fdbafd0f8fb2f8dd630c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/PluginDesc.java#L40) for each kind of plugin. It hasn't been (much of) a problem til now, but we should probably address that somehow before merging this PR. Haha yeah that needs some tweaking. I'm worried about the other effects of changing that line, i'll have to look into that more. > Finally, I could be missing something, but it seems like we don't have logic to ensure that plugins which implement multiple plugin interfaces (like the aforementioned JSON converter) have all of the necessary service loader manifests; instead, it looks like we only check for at least one manifest. If that's the case, can we start reporting these plugins as not having manifests, in the output lines for the plugin types that they're missing manifests for? Well, if the PluginDesc constructor was consistent with equals, then if you had a plugin implement multiple interfaces it would show up in two PluginDesc objects, one for each type. Then, as the script iterates over the PluginScanResult, it would perform the migration once as one type and then again as the other type. I think this is broken because of the PluginDesc constructor, so i'll adjust that and then the migration should work correctly. @C0urante Thank you for being so thorough, these are details that are really important to get right. That goes for all of the other PRs in this KIP, the feature wouldn't be half as reliable without your scrutiny. I'll take some time to iterate on this and let you know when this is ready for another pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
