Copilot commented on code in PR #2837:
URL: https://github.com/apache/tika/pull/2837#discussion_r3308620683
##########
tika-pipes/tika-async-cli/src/main/java/org/apache/tika/async/cli/PluginsWriter.java:
##########
@@ -43,50 +43,35 @@ public PluginsWriter(SimpleAsyncConfig simpleAsyncConfig,
Path pluginsConfig) {
}
void write(Path output) throws IOException {
- Path baseInput = StringUtils.isBlank(simpleAsyncConfig.getInputDir())
- ? Paths.get(".").toAbsolutePath()
- : Paths.get(simpleAsyncConfig.getInputDir());
- Path baseOutput = StringUtils.isBlank(simpleAsyncConfig.getOutputDir())
- ? null
- : Paths.get(simpleAsyncConfig.getOutputDir());
- if (Files.isRegularFile(baseInput)) {
+ boolean userConfigProvided =
!StringUtils.isBlank(simpleAsyncConfig.getTikaConfig());
+ boolean inputExplicit =
!StringUtils.isBlank(simpleAsyncConfig.getInputDir());
+ boolean outputExplicit =
!StringUtils.isBlank(simpleAsyncConfig.getOutputDir());
+
+ // Resolve baseInput. If -i is explicit, use it. If not and the user
+ // didn't supply --config, fall back to '.' so the template's
+ // FETCHER_BASE_PATH placeholder gets a sane default. If --config is
+ // supplied and -i isn't, baseInput stays null so we don't trample the
+ // user's own basePath values.
+ Path baseInput = null;
+ if (inputExplicit) {
+ baseInput = Paths.get(simpleAsyncConfig.getInputDir());
+ } else if (!userConfigProvided) {
Review Comment:
When `--config` is supplied without `-i`, `baseInput` is left null even if
the user config does not replace the template `fetchers`/`pipes-iterator`
sections. In that partial-config case, the generated config keeps the literal
`FETCHER_BASE_PATH` placeholders from `config-template.json`, whereas the
previous behavior patched them to `.`. Consider only skipping the default input
patch when the merged config actually supplies its own filesystem base paths,
or patching remaining placeholders to the default path.
##########
tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/AbstractComponentManager.java:
##########
@@ -115,6 +115,21 @@ protected Map<String, ExtensionConfig>
validateAndCollectConfigs(
Map<String, ExtensionConfig> configs = new HashMap<>();
if (configNode != null && !configNode.isNull()) {
+ // Strict shape check. The section must be a JSON object keyed by
+ // instance ID — e.g. {"my-fetcher": {"file-system-fetcher":
{...}}}.
+ // Without this, an array like
+ // "fetchers": [{"file-system-fetcher": {"id": "my-fetcher",
...}}]
+ // would be silently walked past (JsonNode.fields() on an ArrayNode
+ // returns an empty iterator), leaving the manager with no
registered
+ // components and the user with an "Available: []" error at lookup
+ // time instead of at load time.
+ if (!configNode.isObject()) {
+ throw new TikaConfigException(
+ "Invalid '" + getConfigKey() + "' configuration:
expected a JSON "
+ + "object keyed by instance ID, e.g.
{\"my-id\": {\"type-name\": "
+ + "{...config...}}}. Got " +
configNode.getNodeType() + ". "
+ + "(Array-style configurations are not
supported.)");
Review Comment:
This strict object-only check invalidates several checked-in gRPC/Ignite
configs that still use array-form `fetchers`/`emitters` (for example
`tika-e2e-tests/tika-grpc/src/test/resources/tika-config-ignite-local.json`,
which `IgniteConfigStoreTest` starts the server with). Those configs now fail
at load time before the server can start, so the remaining array-form
repository configs should be migrated to the id-keyed object schema in the same
change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]