nddipiazza opened a new pull request, #2500: URL: https://github.com/apache/tika/pull/2500
## JIRA Ticket https://issues.apache.org/jira/browse/TIKA-4600 ## Summary Adds end-to-end (E2E) tests for tika-grpc in a new standalone `tika-e2e-tests` module. This integrates the existing tika-grpc-e2e-test implementation into the main Tika repository. ## Changes - **Created `tika-e2e-tests/` standalone module** (NOT included in parent POM) - Independent build lifecycle - can be built/tested separately - Won't slow down main Tika build - Can be integrated into CI/CD as separate step - **Integrated tika-grpc E2E tests** as `tika-e2e-tests/tika-grpc` - Migrated from external tika-grpc-e2e-test repository - Tests use Testcontainers and Docker Compose - Validates gRPC server startup, document parsing, metadata extraction - Tests filesystem fetcher and Ignite config store - **Added parent POM** with shared dependency management - **Included sample configurations** for OCR, GROBID, NER, vision, etc. - **Added comprehensive documentation** for running E2E tests ## Structure ``` tika/ ├── tika-app/ ├── tika-core/ ├── ... (other modules) ├── tika-e2e-tests/ ← NEW (standalone, not in parent POM) │ ├── pom.xml ← Parent POM for all E2E tests │ ├── README.md │ └── tika-grpc/ ← First E2E test module │ ├── pom.xml │ ├── README.md │ ├── sample-configs/ │ └── src/test/ └── pom.xml ← Does NOT reference tika-e2e-tests ``` ## Why Standalone? The E2E tests are intentionally NOT part of the main build because they: - Require Docker and Testcontainers - Take significantly longer than unit tests - Need external resources (GovDocs1 corpus, Docker images) - Should be run selectively by developers - Can be integrated into release pipeline as separate step ## Testing Build and run E2E tests: ```bash cd tika-e2e-tests mvn clean install mvn test ``` Run specific tests: ```bash cd tika-e2e-tests/tika-grpc mvn test -Dtest=FileSystemFetcherTest mvn test -Dtest=IgniteConfigStoreTest ``` ## Review Focus Areas Please pay special attention to: - [ ] **Module structure**: Verify tika-e2e-tests is properly standalone and NOT referenced in main pom.xml - [ ] **POM structure**: Parent/child POM relationship and dependency management - [ ] **Documentation**: README files are clear and accurate - [ ] **Apache headers**: All new files have proper Apache License headers - [ ] **Integration**: Tests can build independently without affecting main Tika build ## Critical Files to Review - `tika-e2e-tests/pom.xml` - Parent POM for E2E tests - `tika-e2e-tests/README.md` - Overview and usage instructions - `tika-e2e-tests/tika-grpc/pom.xml` - gRPC E2E test module POM - `tika-e2e-tests/tika-grpc/README.md` - gRPC-specific documentation ## Prerequisites for Testing - Docker and Docker Compose installed - Build tika-grpc Docker image: `apache/tika-grpc:local` - Java 17+ - Maven 3.6+ ## Related Tickets - Parent: [TIKA-4599](https://issues.apache.org/jira/browse/TIKA-4599) - Add E2E tests for Tika - Future: [TIKA-4601](https://issues.apache.org/jira/browse/TIKA-4601) - Add E2E tests for tika-server - Future: [TIKA-4602](https://issues.apache.org/jira/browse/TIKA-4602) - Add E2E tests for tika CLI - Future: [TIKA-4603](https://issues.apache.org/jira/browse/TIKA-4603) - Integrate into release pipeline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
