nddipiazza opened a new pull request, #2504: URL: https://github.com/apache/tika/pull/2504
## JIRA Ticket https://issues.apache.org/jira/browse/TIKA-4605 ## Summary Adds a new Google Drive fetcher plugin for Apache Tika pipes that enables fetching content from Google Drive using OAuth2/service account authentication. ## Changes - **Created plugin module**: `tika-pipes-google-drive` under `tika-pipes/tika-pipes-plugins/` - **Implemented classes**: - `GoogleDriveFetcher` - Main fetcher with OAuth2 authentication - `GoogleDriveFetcherFactory` - Factory for creating fetcher instances - `GoogleDriveFetcherConfig` - Configuration class with JSON support - `GoogleDrivePipesPlugin` - PF4J plugin wrapper - **Configuration files**: - `pom.xml` with Google API dependencies - `plugin.properties` for PF4J - `assembly.xml` for ZIP packaging - **Updated parent `pom.xml`** to include new module ## Architecture The implementation follows Apache Tika's plugin pattern (same as TIKA-4604): - Extends `AbstractTikaExtension` - Uses `ExtensionConfig` for JSON configuration - Implements `Fetcher` interface with `Metadata` parameters - Static `build()` method for instantiation - Proper initialization pattern ## Dependencies - Google Drive API (v3-rev20241027-2.0.0) - Google Auth Library OAuth2 HTTP (1.30.0) - Google API Client (1.33.0) - Dependency management added for version convergence ## Configuration Example ```json { "fetchers": { "google-drive-fetcher": { "my-drive": { "serviceAccountKeyBase64": "<base64-encoded-key>", "subjectUser": "[email protected]", "applicationName": "tika-pipes", "spoolToTemp": true, "throttleSeconds": [1, 5, 10] } } } } ``` ## Testing ✅ Code compiles successfully: ```bash mvn clean install -DskipTests -pl tika-pipes/tika-pipes-plugins/tika-pipes-google-drive -am ``` ✅ Code formatted with spotless ✅ All checkstyle checks pass ## Source Ported from: https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
