[
https://issues.apache.org/jira/browse/TIKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048013#comment-18048013
]
ASF GitHub Bot commented on TIKA-4600:
--------------------------------------
nddipiazza opened a new pull request, #2500:
URL: https://github.com/apache/tika/pull/2500
## JIRA Ticket
https://issues.apache.org/jira/browse/TIKA-4600
## Summary
Adds end-to-end (E2E) tests for tika-grpc in a new standalone
`tika-e2e-tests` module. This integrates the existing tika-grpc-e2e-test
implementation into the main Tika repository.
## Changes
- **Created `tika-e2e-tests/` standalone module** (NOT included in parent
POM)
- Independent build lifecycle - can be built/tested separately
- Won't slow down main Tika build
- Can be integrated into CI/CD as separate step
- **Integrated tika-grpc E2E tests** as `tika-e2e-tests/tika-grpc`
- Migrated from external tika-grpc-e2e-test repository
- Tests use Testcontainers and Docker Compose
- Validates gRPC server startup, document parsing, metadata extraction
- Tests filesystem fetcher and Ignite config store
- **Added parent POM** with shared dependency management
- **Included sample configurations** for OCR, GROBID, NER, vision, etc.
- **Added comprehensive documentation** for running E2E tests
## Structure
```
tika/
├── tika-app/
├── tika-core/
├── ... (other modules)
├── tika-e2e-tests/ ← NEW (standalone, not in parent POM)
│ ├── pom.xml ← Parent POM for all E2E tests
│ ├── README.md
│ └── tika-grpc/ ← First E2E test module
│ ├── pom.xml
│ ├── README.md
│ ├── sample-configs/
│ └── src/test/
└── pom.xml ← Does NOT reference tika-e2e-tests
```
## Why Standalone?
The E2E tests are intentionally NOT part of the main build because they:
- Require Docker and Testcontainers
- Take significantly longer than unit tests
- Need external resources (GovDocs1 corpus, Docker images)
- Should be run selectively by developers
- Can be integrated into release pipeline as separate step
## Testing
Build and run E2E tests:
```bash
cd tika-e2e-tests
mvn clean install
mvn test
```
Run specific tests:
```bash
cd tika-e2e-tests/tika-grpc
mvn test -Dtest=FileSystemFetcherTest
mvn test -Dtest=IgniteConfigStoreTest
```
## Review Focus Areas
Please pay special attention to:
- [ ] **Module structure**: Verify tika-e2e-tests is properly standalone and
NOT referenced in main pom.xml
- [ ] **POM structure**: Parent/child POM relationship and dependency
management
- [ ] **Documentation**: README files are clear and accurate
- [ ] **Apache headers**: All new files have proper Apache License headers
- [ ] **Integration**: Tests can build independently without affecting main
Tika build
## Critical Files to Review
- `tika-e2e-tests/pom.xml` - Parent POM for E2E tests
- `tika-e2e-tests/README.md` - Overview and usage instructions
- `tika-e2e-tests/tika-grpc/pom.xml` - gRPC E2E test module POM
- `tika-e2e-tests/tika-grpc/README.md` - gRPC-specific documentation
## Prerequisites for Testing
- Docker and Docker Compose installed
- Build tika-grpc Docker image: `apache/tika-grpc:local`
- Java 17+
- Maven 3.6+
## Related Tickets
- Parent: [TIKA-4599](https://issues.apache.org/jira/browse/TIKA-4599) - Add
E2E tests for Tika
- Future: [TIKA-4601](https://issues.apache.org/jira/browse/TIKA-4601) - Add
E2E tests for tika-server
- Future: [TIKA-4602](https://issues.apache.org/jira/browse/TIKA-4602) - Add
E2E tests for tika CLI
- Future: [TIKA-4603](https://issues.apache.org/jira/browse/TIKA-4603) -
Integrate into release pipeline
> Add E2E tests for tika-grpc
> ---------------------------
>
> Key: TIKA-4600
> URL: https://issues.apache.org/jira/browse/TIKA-4600
> Project: Tika
> Issue Type: Sub-task
> Reporter: Nicholas DiPiazza
> Assignee: Nicholas DiPiazza
> Priority: Major
>
> h2. Overview
> Add end-to-end tests for tika-grpc to the new tika-e2e-tests module.
> h2. Implementation Details
> * Move existing tika-grpc-e2e-test implementation into
> tika-e2e-tests/tika-grpc
> * Tests should validate:
> ** gRPC server startup and shutdown
> ** Document parsing via gRPC endpoints
> ** Metadata extraction
> ** Error handling
> ** Performance characteristics
> * Use Docker containers for test isolation
> * Ensure tests can run in CI/CD environment
> h2. Acceptance Criteria
> * E2E tests for tika-grpc are integrated into tika-e2e-tests module
> * Tests pass in local development environment
> * Tests pass in CI/CD pipeline
> * Documentation updated with how to run E2E tests
--
This message was sent by Atlassian Jira
(v8.20.10#820010)