[ 
https://issues.apache.org/jira/browse/TIKA-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048209#comment-18048209
 ] 

ASF GitHub Bot commented on TIKA-4605:
--------------------------------------

nddipiazza opened a new pull request, #2504:
URL: https://github.com/apache/tika/pull/2504

   ## JIRA Ticket
   https://issues.apache.org/jira/browse/TIKA-4605
   
   ## Summary
   Adds a new Google Drive fetcher plugin for Apache Tika pipes that enables 
fetching content from Google Drive using OAuth2/service account authentication.
   
   ## Changes
   - **Created plugin module**: `tika-pipes-google-drive` under 
`tika-pipes/tika-pipes-plugins/`
   - **Implemented classes**:
     - `GoogleDriveFetcher` - Main fetcher with OAuth2 authentication
     - `GoogleDriveFetcherFactory` - Factory for creating fetcher instances
     - `GoogleDriveFetcherConfig` - Configuration class with JSON support
     - `GoogleDrivePipesPlugin` - PF4J plugin wrapper
   - **Configuration files**:
     - `pom.xml` with Google API dependencies
     - `plugin.properties` for PF4J
     - `assembly.xml` for ZIP packaging
   - **Updated parent `pom.xml`** to include new module
   
   ## Architecture
   The implementation follows Apache Tika's plugin pattern (same as TIKA-4604):
   - Extends `AbstractTikaExtension` 
   - Uses `ExtensionConfig` for JSON configuration
   - Implements `Fetcher` interface with `Metadata` parameters
   - Static `build()` method for instantiation
   - Proper initialization pattern
   
   ## Dependencies
   - Google Drive API (v3-rev20241027-2.0.0)
   - Google Auth Library OAuth2 HTTP (1.30.0)
   - Google API Client (1.33.0)
   - Dependency management added for version convergence
   
   ## Configuration Example
   ```json
   {
     "fetchers": {
       "google-drive-fetcher": {
         "my-drive": {
           "serviceAccountKeyBase64": "<base64-encoded-key>",
           "subjectUser": "[email protected]",
           "applicationName": "tika-pipes",
           "spoolToTemp": true,
           "throttleSeconds": [1, 5, 10]
         }
       }
     }
   }
   ```
   
   ## Testing
   ✅ Code compiles successfully:
   ```bash
   mvn clean install -DskipTests -pl 
tika-pipes/tika-pipes-plugins/tika-pipes-google-drive -am
   ```
   
   ✅ Code formatted with spotless
   ✅ All checkstyle checks pass
   
   ## Source
   Ported from: 
https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive




> Add Google Drive fetcher plugin
> -------------------------------
>
>                 Key: TIKA-4605
>                 URL: https://issues.apache.org/jira/browse/TIKA-4605
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Nicholas DiPiazza
>            Assignee: Nicholas DiPiazza
>            Priority: Major
>
> h2. Overview
> Port the Google Drive fetcher from the external tika-pipes repository as a 
> new Tika plugin. This fetcher enables fetching content from Google Drive 
> using OAuth2 authentication.
> h2. Implementation Details
> * Port code from: 
> https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive
> * Create new plugin module: 
> *tika-pipes/tika-pipes-plugins/tika-pipes-google-drive*
> * Implement as a standard Tika pipes plugin (following plugin architecture)
> * Support OAuth2 authentication for Google Drive
> * Include appropriate dependencies and configuration
> h2. Features
> * Fetch files from Google Drive
> * OAuth2 token-based authentication
> * Support for Google Drive API
> * Configurable service account credentials
> * Error handling and retry logic
> h2. Acceptance Criteria
> * Google Drive fetcher integrated as a Tika plugin
> * Plugin follows standard Tika plugin architecture (like TIKA-4604)
> * Configuration supports OAuth2/service account authentication
> * Code follows Apache Tika patterns (extends AbstractTikaExtension, uses 
> Metadata)
> * All existing tests pass
> * forbiddenapis check passes
> * Plugin can be loaded dynamically by tika-grpc
> h2. Reference
> * External implementation: 
> https://github.com/nddipiazza/tika-pipes/tree/main/tika-pipes-fetchers/tika-fetcher-google-drive
> * Similar implementation: TIKA-4604 (Atlassian JWT fetcher)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to