gharris1727 commented on PR #14064:
URL: https://github.com/apache/kafka/pull/14064#issuecomment-1648540605

   > I think TSV is fine for the output format. It'd definitely help if we 
added headers.
   
   👍 
   
   > I also found it a bit difficult to parse with the plugin location being 
listed first
   
   That has been bothering me too, so i'm glad you mentioned it. Especially 
when the plugins are deep in the filesystem tree, the constant part of the path 
takes up the whole screen. I'll move it.
   
   > When run on a worker config with no plugin.path property, there's no 
output. We should at least log a warning in this case, and if there's no 
legitimate reason to operate on a worker config without a plugin.path property, 
we should probably fail instead.
   
   Thanks for testing this. I'm conflicted, because we disallow omitting all of 
the the location, path, and config arguments at the same time in an attempt to 
avoid empty listings. Since an empty plugin.path is permitted in a valid worker 
config without a warning, I think we also have to permit it here without a 
warning. We'll trust the user says what they mean, and if they don't actually 
mean what they say, they'll learn about that with an empty result.
   
   > More generally, if no plugins are detected, the script exits with no 
output. Perhaps we could summarize how many plugins were found at the end of 
the script?
   
   I have been doing that with `| wc -l`, which becomes a bit messy with the 
addition of the header. Perhaps following the direct listing with a separator 
that is easy to parse (an empty line), and then printing a summary (# of 
plugins, # of classes, # of sink connectors, # that will be changed by 
sync-manifests, etc) would be appealing. That way, anyone that wants to process 
the raw data themselves can cut off the summary at the separator. I'll explore 
this a bit more.
   
   > When run on a worker config with plugin.path set to a blank string (i.e., 
plugin.path=), then the script scans the current directory. Since this is also 
the behavior for Connect workers, I don't think we can disallow this case, but 
maybe it's worth printing a warning (in both cases)?
   
   Oh no, I didn't realize that we accepted relative paths in the 
`plugin.path`, but it makes sense given the re-use of the 
`PluginUtils#pluginLocations`. Since this command will very likely be called 
from a different working directory than the worker itself, I think adding a 
warning for this is a good idea.
   
   > It seems like we're introducing divergence here between the plugin path 
CLI and Connect workers in how aliases are computed. Workers compute aliases 
after doing complete scanning of the plugin path, whereas this script does it 
for each plugin location. Doesn't that introduce false negatives into the 
script's collision detection logic?
   
   I justified it to myself that it was the "most generous" set of aliases, 
counting only collisions internal to each individual plugin location. But that 
isn't even correct with the current implementation, since collisions with the 
classpath also cause aliases to be hidden! I think consistency with the runtime 
is more important, so I'll change this to compute aliases over the whole run. 
It's so sad to lose the incremental output though, since the aliases of the 
first line printed depend on the classname used in the final plugin found.
   
   > Nearly all of the out-of-the-box plugins for Connect (SMTs, client config 
override policies, predicates, converters, header converters) have undefined 
versions. Can we implement the Versioned interface in these plugins so that 
they show up with valid version info? (If so, it's fine to leave that for a 
follow-up PR).
   
   This is an amazing suggestion, and I'll certainly implement that in a 
follow-up PR.
   
   > IIUC this is because the JSON converter implements two different plugin 
interfaces (Converter and HeaderConverter), but the PluginDesc class is only 
capable of storing [one plugin 
type](https://github.com/apache/kafka/blob/84691b11f64e85a7f3f6fdbafd0f8fb2f8dd630c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/PluginDesc.java#L40)
 for each kind of plugin. It hasn't been (much of) a problem til now, but we 
should probably address that somehow before merging this PR.
   
   Haha yeah that needs some tweaking. I'm worried about the other effects of 
changing that line, i'll have to look into that more.
   
   > Finally, I could be missing something, but it seems like we don't have 
logic to ensure that plugins which implement multiple plugin interfaces (like 
the aforementioned JSON converter) have all of the necessary service loader 
manifests; instead, it looks like we only check for at least one manifest. If 
that's the case, can we start reporting these plugins as not having manifests, 
in the output lines for the plugin types that they're missing manifests for?
   
   Well, if the PluginDesc constructor was consistent with equals, then if you 
had a plugin implement multiple interfaces it would show up in two PluginDesc 
objects, one for each type. Then, as the script iterates over the 
PluginScanResult, it would perform the migration once as one type and then 
again as the other type. I think this is broken because of the PluginDesc 
constructor, so i'll adjust that and then the migration should work correctly.
   
   @C0urante Thank you for being so thorough, these are details that are really 
important to get right. That goes for all of the other PRs in this KIP, the 
feature wouldn't be half as reliable without your scrutiny. I'll take some time 
to iterate on this and let you know when this is ready for another pass.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to