nddipiazza opened a new pull request, #2500:
URL: https://github.com/apache/tika/pull/2500

   ## JIRA Ticket
   https://issues.apache.org/jira/browse/TIKA-4600
   
   ## Summary
   Adds end-to-end (E2E) tests for tika-grpc in a new standalone 
`tika-e2e-tests` module. This integrates the existing tika-grpc-e2e-test 
implementation into the main Tika repository.
   
   ## Changes
   - **Created `tika-e2e-tests/` standalone module** (NOT included in parent 
POM)
     - Independent build lifecycle - can be built/tested separately
     - Won't slow down main Tika build
     - Can be integrated into CI/CD as separate step
   - **Integrated tika-grpc E2E tests** as `tika-e2e-tests/tika-grpc`
     - Migrated from external tika-grpc-e2e-test repository
     - Tests use Testcontainers and Docker Compose
     - Validates gRPC server startup, document parsing, metadata extraction
     - Tests filesystem fetcher and Ignite config store
   - **Added parent POM** with shared dependency management
   - **Included sample configurations** for OCR, GROBID, NER, vision, etc.
   - **Added comprehensive documentation** for running E2E tests
   
   ## Structure
   ```
   tika/
   ├── tika-app/
   ├── tika-core/
   ├── ... (other modules)
   ├── tika-e2e-tests/           ← NEW (standalone, not in parent POM)
   │   ├── pom.xml               ← Parent POM for all E2E tests
   │   ├── README.md
   │   └── tika-grpc/            ← First E2E test module
   │       ├── pom.xml
   │       ├── README.md
   │       ├── sample-configs/
   │       └── src/test/
   └── pom.xml                   ← Does NOT reference tika-e2e-tests
   ```
   
   ## Why Standalone?
   The E2E tests are intentionally NOT part of the main build because they:
   - Require Docker and Testcontainers
   - Take significantly longer than unit tests
   - Need external resources (GovDocs1 corpus, Docker images)
   - Should be run selectively by developers
   - Can be integrated into release pipeline as separate step
   
   ## Testing
   Build and run E2E tests:
   ```bash
   cd tika-e2e-tests
   mvn clean install
   mvn test
   ```
   
   Run specific tests:
   ```bash
   cd tika-e2e-tests/tika-grpc
   mvn test -Dtest=FileSystemFetcherTest
   mvn test -Dtest=IgniteConfigStoreTest
   ```
   
   ## Review Focus Areas
   Please pay special attention to:
   - [ ] **Module structure**: Verify tika-e2e-tests is properly standalone and 
NOT referenced in main pom.xml
   - [ ] **POM structure**: Parent/child POM relationship and dependency 
management
   - [ ] **Documentation**: README files are clear and accurate
   - [ ] **Apache headers**: All new files have proper Apache License headers
   - [ ] **Integration**: Tests can build independently without affecting main 
Tika build
   
   ## Critical Files to Review
   - `tika-e2e-tests/pom.xml` - Parent POM for E2E tests
   - `tika-e2e-tests/README.md` - Overview and usage instructions
   - `tika-e2e-tests/tika-grpc/pom.xml` - gRPC E2E test module POM
   - `tika-e2e-tests/tika-grpc/README.md` - gRPC-specific documentation
   
   ## Prerequisites for Testing
   - Docker and Docker Compose installed
   - Build tika-grpc Docker image: `apache/tika-grpc:local`
   - Java 17+
   - Maven 3.6+
   
   ## Related Tickets
   - Parent: [TIKA-4599](https://issues.apache.org/jira/browse/TIKA-4599) - Add 
E2E tests for Tika
   - Future: [TIKA-4601](https://issues.apache.org/jira/browse/TIKA-4601) - Add 
E2E tests for tika-server
   - Future: [TIKA-4602](https://issues.apache.org/jira/browse/TIKA-4602) - Add 
E2E tests for tika CLI
   - Future: [TIKA-4603](https://issues.apache.org/jira/browse/TIKA-4603) - 
Integrate into release pipeline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to