This is an automated email from the ASF dual-hosted git repository.

hansva pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hop.git


The following commit(s) were added to refs/heads/main by this push:
     new 2dedc4562a GCP and BigQuery connection updates. fixes #5721 (#5722)
2dedc4562a is described below

commit 2dedc4562a63b0021f261d61b23469a8b8f34a61
Author: Bart Maertens <[email protected]>
AuthorDate: Mon Sep 29 09:57:12 2025 +0200

    GCP and BigQuery connection updates. fixes #5721 (#5722)
---
 .../pages/database/databases/googlebigquery.adoc   | 81 ++++++++++++++++++++++
 .../pages/snippets/gcp-service-account-setup.adoc  | 74 ++++++++++++++++++++
 2 files changed, 155 insertions(+)

diff --git 
a/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
 
b/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
index 596917b2c3..47955de64d 100644
--- 
a/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
+++ 
b/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
@@ -34,6 +34,8 @@ under the License.
 |Driver folder | <Hop Installation>/lib/jdbc
 |===
 
+== Driver Installation
+
 The Simba driver is packaged as a .zip containing many jars. Only a subset of 
the jars included with the Driver are necessary to use Bigquery JDBC with 
Apache Hop. Furthermore, some of the jars may conflict with those packaged with 
Hop and *must* be excluded.
 
 **SIMBA DRIVER JARS TO EXCLUDE (THESE JARS ARE INCLUDED IN THE SIMBA DRIVER, 
BUT CONFLICT WITH HOP LIBRARIES AND MUST BE EXCLUDED**
@@ -61,3 +63,82 @@ The Simba driver is packaged as a .zip containing many jars. 
Only a subset of th
  threetenbp-<VERSION>.jar
 
 pass:[*] Tested with Hop 2.5.0 and Simba 1.3.3.1004. Not all authentication 
methods tested either, so this list may not be exhaustive
+
+== Connection Configuration
+
+=== Basic Connection Settings
+
+When creating a Google BigQuery connection in Apache Hop, configure the 
following basic settings:
+
+[cols="2*",options="header"]
+|===
+| Setting | Value
+| Connection name | Your preferred connection name (e.g., "BQ")
+| Connection type | Google BigQuery
+| Access | Native (JDBC)
+| Server host name | https://www.googleapis.com/bigquery/v2
+| Database name | Your Google Cloud Project ID
+| Port number | 443
+|===
+
+=== Authentication Options
+
+Google BigQuery connections in Apache Hop support multiple authentication 
methods. The most common and recommended approach is using Service Account 
authentication with a JSON key file.
+
+In the *Options* tab of the connection dialog, configure the following 
parameters:
+
+[cols="3*",options="header"]
+|===
+| Parameter | Value | Description
+| OAuthType | 0 | Service Account authentication
+| ProjectId | your-project-id | Your Google Cloud Project ID
+| OAuthServiceAcctEmail | [email protected] 
| Service account email address
+| OAuthPvtKeyPath | /path/to/service-account-key.json | Path to the service 
account JSON key file
+| TimeOut | 3600 | Connection timeout in seconds (optional)
+|===
+
+=== Example Configuration
+
+Here's a complete example of a working BigQuery connection configuration:
+
+* **Connection name**: BQ
+* **Server host name**: https://www.googleapis.com/bigquery/v2
+* **Database name**: your-project-id
+* **Port number**: 443
+
+**Options tab parameters**:
+
+* **OAuthType**: 0
+* **ProjectId**: your-project-id
+* **OAuthServiceAcctEmail**: 
[email protected]
+* **OAuthPvtKeyPath**: /path/to/your-service-account-key.json
+* **TimeOut**: 3600
+
+include::../../snippets/gcp-service-account-setup.adoc[]
+
+== Testing the Connection
+
+After configuring your connection:
+
+1. Click the "Test" button in the connection dialog
+2. If successful, you should see a confirmation message
+3. If the test fails, verify:
+   - Your service account has the necessary BigQuery permissions
+   - The JSON key file path is correct and accessible
+   - Your project ID matches the one in the service account
+   - Network connectivity to Google APIs is available
+
+== Troubleshooting
+
+=== Common Issues
+
+* **Authentication errors**: Verify that your service account has the required 
BigQuery roles and that the JSON key file path is correct
+* **Project not found**: Ensure the ProjectId parameter matches your actual 
Google Cloud Project ID
+* **Connection timeout**: Increase the TimeOut value if you're experiencing 
slow connections
+* **Driver conflicts**: Ensure you've excluded the conflicting GRPC jars as 
listed in the driver installation section
+
+=== Performance Considerations
+
+* Use appropriate timeouts for large query operations
+* Consider using BigQuery's standard SQL dialect for better performance
+* Implement proper error handling in your workflows when working with large 
datasets
diff --git 
a/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
 
b/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
new file mode 100644
index 0000000000..f8774bf631
--- /dev/null
+++ 
b/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
@@ -0,0 +1,74 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+////
+
+== Google Cloud Platform Service Account Setup
+
+To connect to Google Cloud services, you need to set up a service account in 
your Google Cloud Platform (GCP) project and download the authentication 
credentials.
+
+=== Creating a Service Account
+
+1. **Navigate to the Google Cloud Console**:
+   - Go to https://console.cloud.google.com/
+   - Select your project or create a new one
+
+2. **Create a Service Account**:
+   - Navigate to "IAM & Admin" > "Service Accounts"
+   - Click "Create Service Account"
+   - Provide a name and description for your service account
+   - Click "Create and Continue"
+
+3. **Assign Roles**:
+   - Assign the appropriate roles to your service account based on the service 
you're connecting to:
+     * For BigQuery: 
+       ** `BigQuery Data Viewer` (`roles/bigquery.dataViewer`) - for read 
access
+       ** `BigQuery Job User` (`roles/bigquery.jobUser`) - for running queries 
(required)
+       ** `BigQuery Data Editor` (`roles/bigquery.dataEditor`) - if write 
access needed
+       ** `BigQuery Data Owner` (`roles/bigquery.dataOwner`) - if full control 
needed
+     * For Cloud Storage: `Storage Object Viewer`, `Storage Object Admin` (if 
write access needed)
+     * For other services: Consult the specific service documentation for 
required roles
+   - Click "Continue" and then "Done"
+
+4. **Generate and Download Key**:
+   - Click on the created service account
+   - Go to the "Keys" tab
+   - Click "Add Key" > "Create new key"
+   - Select "JSON" format
+   - Click "Create" to download the key file
+
+5. **Secure the Key File**:
+   - Store the downloaded JSON key file in a secure location
+   - Note the full path to this file - you'll need it for authentication 
configuration
+   - Ensure the file has appropriate permissions (readable only by the user 
running Hop)
+
+=== Alternative: Using Application Default Credentials
+
+Apache Hop can also use Google Cloud's Application Default Credentials (ADC) 
if you're running Hop on Google Cloud Platform or have configured the Google 
Cloud SDK locally.
+
+To use ADC:
+1. Install and configure the Google Cloud SDK
+2. Run `gcloud auth application-default login` to set up default credentials
+3. In your Hop connection, you can omit service account key file parameters
+
+This method is particularly useful for development environments or when 
running Hop on Google Cloud Platform services.
+
+=== Security Best Practices
+
+* **Principle of Least Privilege**: Only assign the minimum roles necessary 
for your use case
+* **Key Rotation**: Regularly rotate service account keys (recommended every 
90 days)
+* **Environment Variables**: Consider using environment variables to store key 
file paths instead of hardcoding them
+* **Access Control**: Restrict access to service account key files using 
appropriate file system permissions
+* **Monitoring**: Enable audit logging to monitor service account usage

Reply via email to