[
https://issues.apache.org/jira/browse/TIKA-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048209#comment-18048209
]
ASF GitHub Bot commented on TIKA-4605:
--------------------------------------
nddipiazza opened a new pull request, #2504:
URL: https://github.com/apache/tika/pull/2504
## JIRA Ticket
https://issues.apache.org/jira/browse/TIKA-4605
## Summary
Adds a new Google Drive fetcher plugin for Apache Tika pipes that enables
fetching content from Google Drive using OAuth2/service account authentication.
## Changes
- **Created plugin module**: `tika-pipes-google-drive` under
`tika-pipes/tika-pipes-plugins/`
- **Implemented classes**:
- `GoogleDriveFetcher` - Main fetcher with OAuth2 authentication
- `GoogleDriveFetcherFactory` - Factory for creating fetcher instances
- `GoogleDriveFetcherConfig` - Configuration class with JSON support
- `GoogleDrivePipesPlugin` - PF4J plugin wrapper
- **Configuration files**:
- `pom.xml` with Google API dependencies
- `plugin.properties` for PF4J
- `assembly.xml` for ZIP packaging
- **Updated parent `pom.xml`** to include new module
## Architecture
The implementation follows Apache Tika's plugin pattern (same as TIKA-4604):
- Extends `AbstractTikaExtension`
- Uses `ExtensionConfig` for JSON configuration
- Implements `Fetcher` interface with `Metadata` parameters
- Static `build()` method for instantiation
- Proper initialization pattern
## Dependencies
- Google Drive API (v3-rev20241027-2.0.0)
- Google Auth Library OAuth2 HTTP (1.30.0)
- Google API Client (1.33.0)
- Dependency management added for version convergence
## Configuration Example
```json
{
"fetchers": {
"google-drive-fetcher": {
"my-drive": {
"serviceAccountKeyBase64": "<base64-encoded-key>",
"subjectUser": "[email protected]",
"applicationName": "tika-pipes",
"spoolToTemp": true,
"throttleSeconds": [1, 5, 10]
}
}
}
}
```
## Testing
✅ Code compiles successfully:
```bash
mvn clean install -DskipTests -pl
tika-pipes/tika-pipes-plugins/tika-pipes-google-drive -am
```
✅ Code formatted with spotless
✅ All checkstyle checks pass
## Source
Ported from:
https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive
> Add Google Drive fetcher plugin
> -------------------------------
>
> Key: TIKA-4605
> URL: https://issues.apache.org/jira/browse/TIKA-4605
> Project: Tika
> Issue Type: New Feature
> Reporter: Nicholas DiPiazza
> Assignee: Nicholas DiPiazza
> Priority: Major
>
> h2. Overview
> Port the Google Drive fetcher from the external tika-pipes repository as a
> new Tika plugin. This fetcher enables fetching content from Google Drive
> using OAuth2 authentication.
> h2. Implementation Details
> * Port code from:
> https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive
> * Create new plugin module:
> *tika-pipes/tika-pipes-plugins/tika-pipes-google-drive*
> * Implement as a standard Tika pipes plugin (following plugin architecture)
> * Support OAuth2 authentication for Google Drive
> * Include appropriate dependencies and configuration
> h2. Features
> * Fetch files from Google Drive
> * OAuth2 token-based authentication
> * Support for Google Drive API
> * Configurable service account credentials
> * Error handling and retry logic
> h2. Acceptance Criteria
> * Google Drive fetcher integrated as a Tika plugin
> * Plugin follows standard Tika plugin architecture (like TIKA-4604)
> * Configuration supports OAuth2/service account authentication
> * Code follows Apache Tika patterns (extends AbstractTikaExtension, uses
> Metadata)
> * All existing tests pass
> * forbiddenapis check passes
> * Plugin can be loaded dynamically by tika-grpc
> h2. Reference
> * External implementation:
> https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive
> * Similar implementation: TIKA-4604 (Atlassian JWT fetcher)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)