nddipiazza commented on code in PR #2655:
URL: https://github.com/apache/tika/pull/2655#discussion_r2874740419


##########
tika-e2e-tests/tika-grpc/README.md:
##########
@@ -0,0 +1,144 @@
+# Tika gRPC End-to-End Tests
+
+End-to-end integration tests for Apache Tika gRPC Server using Testcontainers.
+
+## Overview
+
+This test module validates the functionality of Apache Tika gRPC Server by:
+- Starting a tika-grpc Docker container using Docker Compose
+- Loading test documents from the GovDocs1 corpus
+- Testing various fetchers (filesystem, Ignite config store, etc.)
+- Verifying parsing results and metadata extraction
+
+## Prerequisites
+
+- Java 17 or later
+- Maven 3.6 or later
+- Docker and Docker Compose
+- Internet connection (for downloading test documents)
+- Docker image `apache/tika-grpc:local` (see below)
+
+## Building
+
+```bash
+./mvnw clean install
+```

Review Comment:
   Fixed — all mvnw references in tika-grpc/README.md updated to ../../mvnw 
(from tika-e2e-tests/tika-grpc/).



##########
tika-e2e-tests/tika-grpc/README.md:
##########
@@ -0,0 +1,144 @@
+# Tika gRPC End-to-End Tests
+
+End-to-end integration tests for Apache Tika gRPC Server using Testcontainers.
+
+## Overview
+
+This test module validates the functionality of Apache Tika gRPC Server by:
+- Starting a tika-grpc Docker container using Docker Compose
+- Loading test documents from the GovDocs1 corpus
+- Testing various fetchers (filesystem, Ignite config store, etc.)
+- Verifying parsing results and metadata extraction

Review Comment:
   Fixed — README rewritten to describe local-server mode as the default and 
clarify Docker is only needed for Docker Compose mode.



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/ignite/IgniteConfigStoreTest.java:
##########
@@ -0,0 +1,591 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.ignite;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.net.URL;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.time.Duration;
+import java.time.temporal.ChronoUnit;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Locale;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import io.grpc.stub.StreamObserver;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestInstance;
+import org.junit.jupiter.api.condition.DisabledOnOs;
+import org.junit.jupiter.api.condition.OS;
+import org.testcontainers.containers.DockerComposeContainer;
+import org.testcontainers.containers.output.Slf4jLogConsumer;
+import org.testcontainers.containers.wait.strategy.Wait;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import org.apache.tika.FetchAndParseReply;
+import org.apache.tika.FetchAndParseRequest;
+import org.apache.tika.SaveFetcherReply;
+import org.apache.tika.SaveFetcherRequest;
+import org.apache.tika.TikaGrpc;
+import org.apache.tika.pipes.fetcher.fs.FileSystemFetcherConfig;
+
+@TestInstance(TestInstance.Lifecycle.PER_CLASS)
+@Testcontainers
+@Slf4j
+@Tag("E2ETest")
+@DisabledOnOs(value = OS.WINDOWS, disabledReason = "Maven not on PATH and 
Docker/Testcontainers not supported on Windows CI")

Review Comment:
   Fixed — @DisabledOnOs reason updated to accurately describe the Windows 
CreateProcess error=206 classpath length limit.



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/ExternalTestBase.java:
##########
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.time.Duration;
+import java.time.temporal.ChronoUnit;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.TestInstance;
+import org.testcontainers.containers.DockerComposeContainer;
+import org.testcontainers.containers.output.Slf4jLogConsumer;
+import org.testcontainers.containers.wait.strategy.Wait;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import org.apache.tika.FetchAndParseReply;
+
+@TestInstance(TestInstance.Lifecycle.PER_CLASS)
+@Testcontainers
+@Slf4j
+@Tag("E2ETest")
+public abstract class ExternalTestBase {
+    public static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+    public static final int MAX_STARTUP_TIMEOUT = 120;
+    public static final String GOV_DOCS_FOLDER = "/tika/govdocs1";
+    public static final File TEST_FOLDER = new File("target", "govdocs1");
+    public static final int GOV_DOCS_FROM_IDX = 
Integer.parseInt(System.getProperty("govdocs1.fromIndex", "1"));
+    public static final int GOV_DOCS_TO_IDX = 
Integer.parseInt(System.getProperty("govdocs1.toIndex", "1"));
+    public static final String DIGITAL_CORPORA_ZIP_FILES_URL = 
"https://corp.digitalcorpora.org/corpora/files/govdocs1/zipfiles";;
+    private static final boolean USE_LOCAL_SERVER = 
Boolean.parseBoolean(System.getProperty("tika.e2e.useLocalServer", "false"));
+    private static final int GRPC_PORT = 
Integer.parseInt(System.getProperty("tika.e2e.grpcPort", "50052"));
+    
+    public static DockerComposeContainer<?> composeContainer;
+    private static Process localGrpcProcess;
+
+    @BeforeAll
+    static void setup() throws Exception {
+        loadGovdocs1();
+        
+        if (USE_LOCAL_SERVER) {
+            startLocalGrpcServer();
+        } else {
+            startDockerGrpcServer();
+        }
+    }
+    
+    private static void startLocalGrpcServer() throws Exception {
+        log.info("Starting local tika-grpc server using Maven exec");
+        
+        Path tikaGrpcDir = findTikaGrpcDirectory();
+        Path configFile = 
Path.of("src/test/resources/tika-config.json").toAbsolutePath();
+        
+        if (!Files.exists(configFile)) {
+            throw new IllegalStateException("Config file not found: " + 
configFile);
+        }
+        
+        log.info("Using tika-grpc from: {}", tikaGrpcDir);
+        log.info("Using config file: {}", configFile);
+        
+        String javaHome = System.getProperty("java.home");
+        boolean isWindows = 
System.getProperty("os.name").toLowerCase(Locale.ROOT).contains("win");
+        String javaCmd = javaHome + (isWindows ? "\\bin\\java.exe" : 
"/bin/java");
+        String mvnCmd = tikaGrpcDir.getParent().resolve(isWindows ? "mvnw.cmd" 
: "mvnw").toString();
+        
+        ProcessBuilder pb = new ProcessBuilder(
+            mvnCmd,
+            "exec:exec",
+            "-Dexec.executable=" + javaCmd,
+            "-Dexec.args=" +
+                "--add-opens=java.base/java.lang=ALL-UNNAMED " +
+                "--add-opens=java.base/java.nio=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED " +
+                "-classpath %classpath " +
+                "org.apache.tika.pipes.grpc.TikaGrpcServer " +
+                "-c " + configFile + " " +
+                "-p " + GRPC_PORT
+        );
+        
+        pb.directory(tikaGrpcDir.toFile());
+        pb.redirectErrorStream(true);
+        pb.redirectOutput(ProcessBuilder.Redirect.PIPE);
+        
+        localGrpcProcess = pb.start();
+        
+        Thread logThread = new Thread(() -> {
+            try (BufferedReader reader = new BufferedReader(
+                    new InputStreamReader(localGrpcProcess.getInputStream(), 
StandardCharsets.UTF_8))) {
+                String line;
+                while ((line = reader.readLine()) != null) {
+                    log.info("tika-grpc: {}", line);
+                }
+            } catch (IOException e) {
+                log.error("Error reading server output", e);
+            }
+        });
+        logThread.setDaemon(true);
+        logThread.start();
+        
+        waitForServerReady();
+        
+        log.info("Local tika-grpc server started successfully on port {}", 
GRPC_PORT);
+    }
+    
+    private static Path findTikaGrpcDirectory() {
+        Path currentDir = Path.of("").toAbsolutePath();
+        Path tikaRootDir = currentDir;
+        
+        while (tikaRootDir != null && 
+               !(Files.exists(tikaRootDir.resolve("tika-grpc")) && 
+                 Files.exists(tikaRootDir.resolve("tika-e2e-tests")))) {
+            tikaRootDir = tikaRootDir.getParent();
+        }
+        
+        if (tikaRootDir == null) {
+            throw new IllegalStateException("Cannot find tika root directory. 
" +
+                "Current dir: " + currentDir);
+        }
+        
+        return tikaRootDir.resolve("tika-grpc");
+    }
+    
+    private static void waitForServerReady() throws Exception {
+        int maxAttempts = 60;
+        for (int i = 0; i < maxAttempts; i++) {
+            try {
+                ManagedChannel testChannel = ManagedChannelBuilder
+                    .forAddress("localhost", GRPC_PORT)
+                    .usePlaintext()
+                    .build();
+                
+                try {
+                    testChannel.getState(true);
+                    TimeUnit.MILLISECONDS.sleep(100);
+                    if 
(testChannel.getState(false).toString().contains("READY")) {
+                        log.info("gRPC server is ready!");
+                        return;
+                    }

Review Comment:
   Fixed — waitForServerReady() now calls listFetchers() via a real gRPC stub 
rather than polling channel state, confirming the service layer is up before 
tests run.



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/ignite/IgniteConfigStoreTest.java:
##########
@@ -0,0 +1,591 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.ignite;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.net.URL;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.time.Duration;
+import java.time.temporal.ChronoUnit;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Locale;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import io.grpc.stub.StreamObserver;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestInstance;
+import org.junit.jupiter.api.condition.DisabledOnOs;
+import org.junit.jupiter.api.condition.OS;
+import org.testcontainers.containers.DockerComposeContainer;
+import org.testcontainers.containers.output.Slf4jLogConsumer;
+import org.testcontainers.containers.wait.strategy.Wait;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import org.apache.tika.FetchAndParseReply;
+import org.apache.tika.FetchAndParseRequest;
+import org.apache.tika.SaveFetcherReply;
+import org.apache.tika.SaveFetcherRequest;
+import org.apache.tika.TikaGrpc;
+import org.apache.tika.pipes.fetcher.fs.FileSystemFetcherConfig;
+
+@TestInstance(TestInstance.Lifecycle.PER_CLASS)
+@Testcontainers
+@Slf4j
+@Tag("E2ETest")
+@DisabledOnOs(value = OS.WINDOWS, disabledReason = "Maven not on PATH and 
Docker/Testcontainers not supported on Windows CI")
+class IgniteConfigStoreTest {
+    
+    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+    private static final int MAX_STARTUP_TIMEOUT = 120;
+    private static final File TEST_FOLDER = new File("target", "govdocs1");
+    private static final int GOV_DOCS_FROM_IDX = 
Integer.parseInt(System.getProperty("govdocs1.fromIndex", "1"));
+    private static final int GOV_DOCS_TO_IDX = 
Integer.parseInt(System.getProperty("govdocs1.toIndex", "1"));
+    private static final String DIGITAL_CORPORA_ZIP_FILES_URL = 
"https://corp.digitalcorpora.org/corpora/files/govdocs1/zipfiles";;
+    private static final boolean USE_LOCAL_SERVER = 
Boolean.parseBoolean(System.getProperty("tika.e2e.useLocalServer", "false"));

Review Comment:
   Fixed — USE_LOCAL_SERVER in IgniteConfigStoreTest now defaults to true.



##########
tika-e2e-tests/tika-grpc/README.md:
##########
@@ -0,0 +1,144 @@
+# Tika gRPC End-to-End Tests
+
+End-to-end integration tests for Apache Tika gRPC Server using Testcontainers.
+
+## Overview
+
+This test module validates the functionality of Apache Tika gRPC Server by:
+- Starting a tika-grpc Docker container using Docker Compose
+- Loading test documents from the GovDocs1 corpus
+- Testing various fetchers (filesystem, Ignite config store, etc.)
+- Verifying parsing results and metadata extraction
+
+## Prerequisites
+
+- Java 17 or later
+- Maven 3.6 or later
+- Docker and Docker Compose
+- Internet connection (for downloading test documents)

Review Comment:
   Fixed — tika-grpc/README.md now lists Docker as optional and only required 
for Docker Compose mode.



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/filesystem/FileSystemFetcherTest.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.filesystem;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+
+import io.grpc.ManagedChannel;
+import io.grpc.stub.StreamObserver;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.condition.DisabledOnOs;
+import org.junit.jupiter.api.condition.OS;
+
+import org.apache.tika.FetchAndParseReply;
+import org.apache.tika.FetchAndParseRequest;
+import org.apache.tika.SaveFetcherReply;
+import org.apache.tika.SaveFetcherRequest;
+import org.apache.tika.TikaGrpc;
+import org.apache.tika.pipes.ExternalTestBase;
+import org.apache.tika.pipes.fetcher.fs.FileSystemFetcherConfig;
+
+@Slf4j
+@DisabledOnOs(value = OS.WINDOWS, disabledReason = "exec:exec classpath 
exceeds Windows CreateProcess command-line length limit")
+class FileSystemFetcherTest extends ExternalTestBase {
+    
+    @Test
+    void testFileSystemFetcher() throws Exception {
+        String fetcherId = "defaultFetcher";
+        ManagedChannel channel = getManagedChannel();
+        TikaGrpc.TikaBlockingStub blockingStub = 
TikaGrpc.newBlockingStub(channel);
+        TikaGrpc.TikaStub tikaStub = TikaGrpc.newStub(channel);
+

Review Comment:
   Fixed — ManagedChannel is now shut down in a try/finally block in 
testFileSystemFetcher().



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/ExternalTestBase.java:
##########
@@ -0,0 +1,346 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.time.Duration;
+import java.time.temporal.ChronoUnit;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.TestInstance;
+import org.testcontainers.containers.DockerComposeContainer;
+import org.testcontainers.containers.output.Slf4jLogConsumer;
+import org.testcontainers.containers.wait.strategy.Wait;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import org.apache.tika.FetchAndParseReply;
+
+@TestInstance(TestInstance.Lifecycle.PER_CLASS)
+@Testcontainers
+@Slf4j
+@Tag("E2ETest")
+public abstract class ExternalTestBase {
+    public static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+    public static final int MAX_STARTUP_TIMEOUT = 120;
+    public static final String GOV_DOCS_FOLDER = "/tika/govdocs1";
+    public static final File TEST_FOLDER = new File("target", "govdocs1");
+    public static final int GOV_DOCS_FROM_IDX = 
Integer.parseInt(System.getProperty("govdocs1.fromIndex", "1"));
+    public static final int GOV_DOCS_TO_IDX = 
Integer.parseInt(System.getProperty("govdocs1.toIndex", "1"));
+    public static final String DIGITAL_CORPORA_ZIP_FILES_URL = 
"https://corp.digitalcorpora.org/corpora/files/govdocs1/zipfiles";;
+    private static final boolean USE_LOCAL_SERVER = 
Boolean.parseBoolean(System.getProperty("tika.e2e.useLocalServer", "false"));
+    private static final int GRPC_PORT = 
Integer.parseInt(System.getProperty("tika.e2e.grpcPort", "50052"));
+    
+    public static DockerComposeContainer<?> composeContainer;
+    private static Process localGrpcProcess;
+
+    @BeforeAll
+    static void setup() throws Exception {
+        loadGovdocs1();
+        
+        if (USE_LOCAL_SERVER) {
+            startLocalGrpcServer();
+        } else {
+            startDockerGrpcServer();
+        }
+    }
+    
+    private static void startLocalGrpcServer() throws Exception {
+        log.info("Starting local tika-grpc server using Maven exec");
+        
+        Path tikaGrpcDir = findTikaGrpcDirectory();
+        Path configFile = 
Path.of("src/test/resources/tika-config.json").toAbsolutePath();
+        
+        if (!Files.exists(configFile)) {
+            throw new IllegalStateException("Config file not found: " + 
configFile);
+        }
+        
+        log.info("Using tika-grpc from: {}", tikaGrpcDir);
+        log.info("Using config file: {}", configFile);
+        
+        String javaHome = System.getProperty("java.home");
+        boolean isWindows = 
System.getProperty("os.name").toLowerCase(Locale.ROOT).contains("win");
+        String javaCmd = javaHome + (isWindows ? "\\bin\\java.exe" : 
"/bin/java");
+        String mvnCmd = tikaGrpcDir.getParent().resolve(isWindows ? "mvnw.cmd" 
: "mvnw").toString();
+        
+        ProcessBuilder pb = new ProcessBuilder(
+            mvnCmd,
+            "exec:exec",
+            "-Dexec.executable=" + javaCmd,
+            "-Dexec.args=" +
+                "--add-opens=java.base/java.lang=ALL-UNNAMED " +
+                "--add-opens=java.base/java.nio=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED " +
+                "-classpath %classpath " +
+                "org.apache.tika.pipes.grpc.TikaGrpcServer " +
+                "-c " + configFile + " " +
+                "-p " + GRPC_PORT

Review Comment:
   Fixed — configFile path is now quoted in -Dexec.args to handle paths 
containing spaces.



##########
tika-e2e-tests/tika-grpc/src/test/java/org/apache/tika/pipes/ignite/IgniteConfigStoreTest.java:
##########
@@ -0,0 +1,591 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.ignite;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.net.URL;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.StandardCopyOption;
+import java.time.Duration;
+import java.time.temporal.ChronoUnit;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Locale;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Stream;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipInputStream;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.grpc.ManagedChannel;
+import io.grpc.ManagedChannelBuilder;
+import io.grpc.stub.StreamObserver;
+import lombok.extern.slf4j.Slf4j;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.TestInstance;
+import org.junit.jupiter.api.condition.DisabledOnOs;
+import org.junit.jupiter.api.condition.OS;
+import org.testcontainers.containers.DockerComposeContainer;
+import org.testcontainers.containers.output.Slf4jLogConsumer;
+import org.testcontainers.containers.wait.strategy.Wait;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import org.apache.tika.FetchAndParseReply;
+import org.apache.tika.FetchAndParseRequest;
+import org.apache.tika.SaveFetcherReply;
+import org.apache.tika.SaveFetcherRequest;
+import org.apache.tika.TikaGrpc;
+import org.apache.tika.pipes.fetcher.fs.FileSystemFetcherConfig;
+
+@TestInstance(TestInstance.Lifecycle.PER_CLASS)
+@Testcontainers
+@Slf4j
+@Tag("E2ETest")
+@DisabledOnOs(value = OS.WINDOWS, disabledReason = "Maven not on PATH and 
Docker/Testcontainers not supported on Windows CI")
+class IgniteConfigStoreTest {
+    
+    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+    private static final int MAX_STARTUP_TIMEOUT = 120;
+    private static final File TEST_FOLDER = new File("target", "govdocs1");
+    private static final int GOV_DOCS_FROM_IDX = 
Integer.parseInt(System.getProperty("govdocs1.fromIndex", "1"));
+    private static final int GOV_DOCS_TO_IDX = 
Integer.parseInt(System.getProperty("govdocs1.toIndex", "1"));
+    private static final String DIGITAL_CORPORA_ZIP_FILES_URL = 
"https://corp.digitalcorpora.org/corpora/files/govdocs1/zipfiles";;
+    private static final boolean USE_LOCAL_SERVER = 
Boolean.parseBoolean(System.getProperty("tika.e2e.useLocalServer", "false"));
+    private static final int GRPC_PORT = 
Integer.parseInt(System.getProperty("tika.e2e.grpcPort", "50052"));
+    
+    private static DockerComposeContainer<?> igniteComposeContainer;
+    private static Process localGrpcProcess;
+    
+    @BeforeAll
+    static void setupIgnite() throws Exception {
+        if (USE_LOCAL_SERVER) {
+            try {
+                killProcessOnPort(GRPC_PORT);
+                killProcessOnPort(3344);
+                killProcessOnPort(10800);
+            } catch (Exception e) {
+                log.debug("No orphaned processes to clean up");
+            }
+        }
+        
+        if (!TEST_FOLDER.exists() || TEST_FOLDER.listFiles().length == 0) {
+            downloadAndUnzipGovdocs1(GOV_DOCS_FROM_IDX, GOV_DOCS_TO_IDX);
+        }
+        
+        if (USE_LOCAL_SERVER) {
+            startLocalGrpcServer();
+        } else {
+            startDockerGrpcServer();
+        }
+    }
+    
+    private static void startLocalGrpcServer() throws Exception {
+        log.info("Starting local tika-grpc server using Maven");
+        
+        Path currentDir = Path.of("").toAbsolutePath();
+        Path tikaRootDir = currentDir;
+        
+        while (tikaRootDir != null && 
+               !(Files.exists(tikaRootDir.resolve("tika-grpc")) && 
+                 Files.exists(tikaRootDir.resolve("tika-e2e-tests")))) {
+            tikaRootDir = tikaRootDir.getParent();
+        }
+        
+        if (tikaRootDir == null) {
+            throw new IllegalStateException("Cannot find tika root directory. 
" +
+                "Current dir: " + currentDir + ". " +
+                "Please run from within the tika project.");
+        }
+        
+        Path tikaGrpcDir = tikaRootDir.resolve("tika-grpc");
+        if (!Files.exists(tikaGrpcDir)) {
+            throw new IllegalStateException("Cannot find tika-grpc directory 
at: " + tikaGrpcDir);
+        }
+        
+        String configFileName = "tika-config-ignite-local.json";
+        Path configFile = Path.of("src/test/resources/" + 
configFileName).toAbsolutePath();
+        
+        if (!Files.exists(configFile)) {
+            throw new IllegalStateException("Config file not found: " + 
configFile);
+        }
+        
+        log.info("Tika root: {}", tikaRootDir);
+        log.info("Using tika-grpc from: {}", tikaGrpcDir);
+        log.info("Using config file: {}", configFile);
+        
+        // Use mvn exec:exec to run as external process (not exec:java which 
breaks ServiceLoader)
+        String javaHome = System.getProperty("java.home");
+        boolean isWindows = 
System.getProperty("os.name").toLowerCase(Locale.ROOT).contains("win");
+        String javaCmd = javaHome + (isWindows ? "\\bin\\java.exe" : 
"/bin/java");
+        String mvnCmd = tikaRootDir.resolve(isWindows ? "mvnw.cmd" : 
"mvnw").toString();
+        
+        ProcessBuilder pb = new ProcessBuilder(
+            mvnCmd,
+            "exec:exec",
+            "-Dexec.executable=" + javaCmd,
+            "-Dexec.args=" +
+                "--add-opens=java.base/java.lang=ALL-UNNAMED " +
+                "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED " +
+                "--add-opens=java.base/java.lang.reflect=ALL-UNNAMED " +
+                "--add-opens=java.base/java.io=ALL-UNNAMED " +
+                "--add-opens=java.base/java.nio=ALL-UNNAMED " +
+                "--add-opens=java.base/java.math=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED " +
+                "--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
" +
+                "--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED 
" +
+                "--add-opens=java.base/java.time=ALL-UNNAMED " +
+                "--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED " +
+                "--add-opens=java.base/jdk.internal.access=ALL-UNNAMED " +
+                "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED " +
+                
"--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED " +
+                
"--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED " +
+                "-Dio.netty.tryReflectionSetAccessible=true " +
+                "-Dignite.work.dir=" + 
tikaGrpcDir.resolve("target/ignite-work") + " " +
+                "-classpath %classpath " +
+                "org.apache.tika.pipes.grpc.TikaGrpcServer " +
+                "-c " + configFile + " " +

Review Comment:
   Fixed — both -Dignite.work.dir and the -c configFile argument are now quoted 
in -Dexec.args.



##########
tika-e2e-tests/pom.xml:
##########
@@ -0,0 +1,174 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
https://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <groupId>org.apache.tika</groupId>
+        <artifactId>tika-parent</artifactId>
+        <version>${revision}</version>
+        <relativePath>../tika-parent/pom.xml</relativePath>
+    </parent>
+
+    <artifactId>tika-e2e-tests</artifactId>
+    <packaging>pom</packaging>
+    <name>Apache Tika End-to-End Tests</name>
+    <description>End-to-end integration tests for Apache Tika 
components</description>
+
+    <properties>
+        <maven.compiler.source>17</maven.compiler.source>
+        <maven.compiler.target>17</maven.compiler.target>
+        <maven.compiler.release>17</maven.compiler.release>
+        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+
+        <!-- Tika version -->
+        <tika.version>${revision}</tika.version>
+
+        <!-- Test dependencies -->
+        <junit.version>5.11.4</junit.version>
+        <testcontainers.version>2.0.3</testcontainers.version>
+
+        <!-- Logging -->
+        <slf4j.version>2.0.16</slf4j.version>
+        <log4j.version>2.25.3</log4j.version>
+
+        <!-- Other -->
+        <lombok.version>1.18.32</lombok.version>
+        <jackson.version>2.18.2</jackson.version>

Review Comment:
   Fixed — jackson.version pin removed from tika-e2e-tests/pom.xml; now 
inherits from tika-parent (2.21.1).



##########
tika-e2e-tests/README.md:
##########
@@ -0,0 +1,59 @@
+# Apache Tika End-to-End Tests
+
+End-to-end integration tests for Apache Tika components.
+
+## Overview
+
+This module contains standalone end-to-end (E2E) tests for various Apache Tika 
distribution formats and deployment modes. Unlike unit and integration tests in 
the main Tika build, these E2E tests validate complete deployment scenarios 
using Docker containers and real-world test data.
+
+**Note:** This module is included in the main Tika build under the `e2e` Maven 
profile (`-Pe2e`). Run `mvn test -Pe2e` from the repo root to execute these 
tests.
+
+## Test Modules
+
+- **tika-grpc** - E2E tests for tika-grpc server
+
+## Prerequisites
+
+- Java 17 or later
+- Maven 3.6 or later
+- Docker and Docker Compose
+- Internet connection (for downloading test documents)
+
+## Building All E2E Tests
+
+From this directory:
+
+```bash
+./mvnw clean install
+```

Review Comment:
   Fixed — all command examples in tika-e2e-tests/README.md updated to use 
../mvnw.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to