[ 
https://issues.apache.org/jira/browse/TIKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048013#comment-18048013
 ] 

ASF GitHub Bot commented on TIKA-4600:
--------------------------------------

nddipiazza opened a new pull request, #2500:
URL: https://github.com/apache/tika/pull/2500

   ## JIRA Ticket
   https://issues.apache.org/jira/browse/TIKA-4600
   
   ## Summary
   Adds end-to-end (E2E) tests for tika-grpc in a new standalone 
`tika-e2e-tests` module. This integrates the existing tika-grpc-e2e-test 
implementation into the main Tika repository.
   
   ## Changes
   - **Created `tika-e2e-tests/` standalone module** (NOT included in parent 
POM)
     - Independent build lifecycle - can be built/tested separately
     - Won't slow down main Tika build
     - Can be integrated into CI/CD as separate step
   - **Integrated tika-grpc E2E tests** as `tika-e2e-tests/tika-grpc`
     - Migrated from external tika-grpc-e2e-test repository
     - Tests use Testcontainers and Docker Compose
     - Validates gRPC server startup, document parsing, metadata extraction
     - Tests filesystem fetcher and Ignite config store
   - **Added parent POM** with shared dependency management
   - **Included sample configurations** for OCR, GROBID, NER, vision, etc.
   - **Added comprehensive documentation** for running E2E tests
   
   ## Structure
   ```
   tika/
   ├── tika-app/
   ├── tika-core/
   ├── ... (other modules)
   ├── tika-e2e-tests/           ← NEW (standalone, not in parent POM)
   │   ├── pom.xml               ← Parent POM for all E2E tests
   │   ├── README.md
   │   └── tika-grpc/            ← First E2E test module
   │       ├── pom.xml
   │       ├── README.md
   │       ├── sample-configs/
   │       └── src/test/
   └── pom.xml                   ← Does NOT reference tika-e2e-tests
   ```
   
   ## Why Standalone?
   The E2E tests are intentionally NOT part of the main build because they:
   - Require Docker and Testcontainers
   - Take significantly longer than unit tests
   - Need external resources (GovDocs1 corpus, Docker images)
   - Should be run selectively by developers
   - Can be integrated into release pipeline as separate step
   
   ## Testing
   Build and run E2E tests:
   ```bash
   cd tika-e2e-tests
   mvn clean install
   mvn test
   ```
   
   Run specific tests:
   ```bash
   cd tika-e2e-tests/tika-grpc
   mvn test -Dtest=FileSystemFetcherTest
   mvn test -Dtest=IgniteConfigStoreTest
   ```
   
   ## Review Focus Areas
   Please pay special attention to:
   - [ ] **Module structure**: Verify tika-e2e-tests is properly standalone and 
NOT referenced in main pom.xml
   - [ ] **POM structure**: Parent/child POM relationship and dependency 
management
   - [ ] **Documentation**: README files are clear and accurate
   - [ ] **Apache headers**: All new files have proper Apache License headers
   - [ ] **Integration**: Tests can build independently without affecting main 
Tika build
   
   ## Critical Files to Review
   - `tika-e2e-tests/pom.xml` - Parent POM for E2E tests
   - `tika-e2e-tests/README.md` - Overview and usage instructions
   - `tika-e2e-tests/tika-grpc/pom.xml` - gRPC E2E test module POM
   - `tika-e2e-tests/tika-grpc/README.md` - gRPC-specific documentation
   
   ## Prerequisites for Testing
   - Docker and Docker Compose installed
   - Build tika-grpc Docker image: `apache/tika-grpc:local`
   - Java 17+
   - Maven 3.6+
   
   ## Related Tickets
   - Parent: [TIKA-4599](https://issues.apache.org/jira/browse/TIKA-4599) - Add 
E2E tests for Tika
   - Future: [TIKA-4601](https://issues.apache.org/jira/browse/TIKA-4601) - Add 
E2E tests for tika-server
   - Future: [TIKA-4602](https://issues.apache.org/jira/browse/TIKA-4602) - Add 
E2E tests for tika CLI
   - Future: [TIKA-4603](https://issues.apache.org/jira/browse/TIKA-4603) - 
Integrate into release pipeline




> Add E2E tests for tika-grpc
> ---------------------------
>
>                 Key: TIKA-4600
>                 URL: https://issues.apache.org/jira/browse/TIKA-4600
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Nicholas DiPiazza
>            Assignee: Nicholas DiPiazza
>            Priority: Major
>
> h2. Overview
> Add end-to-end tests for tika-grpc to the new tika-e2e-tests module.
> h2. Implementation Details
> * Move existing tika-grpc-e2e-test implementation into 
> tika-e2e-tests/tika-grpc
> * Tests should validate:
> ** gRPC server startup and shutdown
> ** Document parsing via gRPC endpoints
> ** Metadata extraction
> ** Error handling
> ** Performance characteristics
> * Use Docker containers for test isolation
> * Ensure tests can run in CI/CD environment
> h2. Acceptance Criteria
> * E2E tests for tika-grpc are integrated into tika-e2e-tests module
> * Tests pass in local development environment
> * Tests pass in CI/CD pipeline
> * Documentation updated with how to run E2E tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to