[GitHub] [kafka] gharris1727 commented on pull request #14064: KAFKA-15030: Add connect-plugin-path command-line tool.

via GitHub Mon, 24 Jul 2023 13:13:43 -0700


gharris1727 commented on PR #14064:
URL: https://github.com/apache/kafka/pull/14064#issuecomment-1648540605

> I think TSV is fine for the output format. It'd definitely help if we
added headers.

👍

> I also found it a bit difficult to parse with the plugin location being
listed first

That has been bothering me too, so i'm glad you mentioned it. Especially
when the plugins are deep in the filesystem tree, the constant part of the path
takes up the whole screen. I'll move it.

> When run on a worker config with no plugin.path property, there's no
output. We should at least log a warning in this case, and if there's no
legitimate reason to operate on a worker config without a plugin.path property,
we should probably fail instead.

Thanks for testing this. I'm conflicted, because we disallow omitting all of
the the location, path, and config arguments at the same time in an attempt to
avoid empty listings. Since an empty plugin.path is permitted in a valid worker
config without a warning, I think we also have to permit it here without a
warning. We'll trust the user says what they mean, and if they don't actually
mean what they say, they'll learn about that with an empty result.

> More generally, if no plugins are detected, the script exits with no
output. Perhaps we could summarize how many plugins were found at the end of
the script?

I have been doing that with `| wc -l`, which becomes a bit messy with the
addition of the header. Perhaps following the direct listing with a separator
that is easy to parse (an empty line), and then printing a summary (# of
plugins, # of classes, # of sink connectors, # that will be changed by
sync-manifests, etc) would be appealing. That way, anyone that wants to process
the raw data themselves can cut off the summary at the separator. I'll explore
this a bit more.

> When run on a worker config with plugin.path set to a blank string (i.e.,
plugin.path=), then the script scans the current directory. Since this is also
the behavior for Connect workers, I don't think we can disallow this case, but
maybe it's worth printing a warning (in both cases)?

Oh no, I didn't realize that we accepted relative paths in the
`plugin.path`, but it makes sense given the re-use of the
`PluginUtils#pluginLocations`. Since this command will very likely be called
from a different working directory than the worker itself, I think adding a
warning for this is a good idea.

> It seems like we're introducing divergence here between the plugin path
CLI and Connect workers in how aliases are computed. Workers compute aliases
after doing complete scanning of the plugin path, whereas this script does it
for each plugin location. Doesn't that introduce false negatives into the
script's collision detection logic?

I justified it to myself that it was the "most generous" set of aliases,
counting only collisions internal to each individual plugin location. But that
isn't even correct with the current implementation, since collisions with the
classpath also cause aliases to be hidden! I think consistency with the runtime
is more important, so I'll change this to compute aliases over the whole run.
It's so sad to lose the incremental output though, since the aliases of the
first line printed depend on the classname used in the final plugin found.

> Nearly all of the out-of-the-box plugins for Connect (SMTs, client config
override policies, predicates, converters, header converters) have undefined
versions. Can we implement the Versioned interface in these plugins so that
they show up with valid version info? (If so, it's fine to leave that for a
follow-up PR).

This is an amazing suggestion, and I'll certainly implement that in a
follow-up PR.

> IIUC this is because the JSON converter implements two different plugin
interfaces (Converter and HeaderConverter), but the PluginDesc class is only
capable of storing [one plugin
type](https://github.com/apache/kafka/blob/84691b11f64e85a7f3f6fdbafd0f8fb2f8dd630c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/PluginDesc.java#L40)
for each kind of plugin. It hasn't been (much of) a problem til now, but we
should probably address that somehow before merging this PR.

Haha yeah that needs some tweaking. I'm worried about the other effects of
changing that line, i'll have to look into that more.

> Finally, I could be missing something, but it seems like we don't have
logic to ensure that plugins which implement multiple plugin interfaces (like
the aforementioned JSON converter) have all of the necessary service loader
manifests; instead, it looks like we only check for at least one manifest. If
that's the case, can we start reporting these plugins as not having manifests,
in the output lines for the plugin types that they're missing manifests for?

Well, if the PluginDesc constructor was consistent with equals, then if you
had a plugin implement multiple interfaces it would show up in two PluginDesc
objects, one for each type. Then, as the script iterates over the
PluginScanResult, it would perform the migration once as one type and then
again as the other type. I think this is broken because of the PluginDesc
constructor, so i'll adjust that and then the migration should work correctly.

@C0urante Thank you for being so thorough, these are details that are really
important to get right. That goes for all of the other PRs in this KIP, the
feature wouldn't be half as reliable without your scrutiny. I'll take some time
to iterate on this and let you know when this is ready for another pass.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] gharris1727 commented on pull request #14064: KAFKA-15030: Add connect-plugin-path command-line tool.

Reply via email to