This is an automated email from the ASF dual-hosted git repository.
hansva pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hop.git
The following commit(s) were added to refs/heads/main by this push:
new 2dedc4562a GCP and BigQuery connection updates. fixes #5721 (#5722)
2dedc4562a is described below
commit 2dedc4562a63b0021f261d61b23469a8b8f34a61
Author: Bart Maertens <[email protected]>
AuthorDate: Mon Sep 29 09:57:12 2025 +0200
GCP and BigQuery connection updates. fixes #5721 (#5722)
---
.../pages/database/databases/googlebigquery.adoc | 81 ++++++++++++++++++++++
.../pages/snippets/gcp-service-account-setup.adoc | 74 ++++++++++++++++++++
2 files changed, 155 insertions(+)
diff --git
a/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
b/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
index 596917b2c3..47955de64d 100644
---
a/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
+++
b/docs/hop-user-manual/modules/ROOT/pages/database/databases/googlebigquery.adoc
@@ -34,6 +34,8 @@ under the License.
|Driver folder | <Hop Installation>/lib/jdbc
|===
+== Driver Installation
+
The Simba driver is packaged as a .zip containing many jars. Only a subset of
the jars included with the Driver are necessary to use Bigquery JDBC with
Apache Hop. Furthermore, some of the jars may conflict with those packaged with
Hop and *must* be excluded.
**SIMBA DRIVER JARS TO EXCLUDE (THESE JARS ARE INCLUDED IN THE SIMBA DRIVER,
BUT CONFLICT WITH HOP LIBRARIES AND MUST BE EXCLUDED**
@@ -61,3 +63,82 @@ The Simba driver is packaged as a .zip containing many jars.
Only a subset of th
threetenbp-<VERSION>.jar
pass:[*] Tested with Hop 2.5.0 and Simba 1.3.3.1004. Not all authentication
methods tested either, so this list may not be exhaustive
+
+== Connection Configuration
+
+=== Basic Connection Settings
+
+When creating a Google BigQuery connection in Apache Hop, configure the
following basic settings:
+
+[cols="2*",options="header"]
+|===
+| Setting | Value
+| Connection name | Your preferred connection name (e.g., "BQ")
+| Connection type | Google BigQuery
+| Access | Native (JDBC)
+| Server host name | https://www.googleapis.com/bigquery/v2
+| Database name | Your Google Cloud Project ID
+| Port number | 443
+|===
+
+=== Authentication Options
+
+Google BigQuery connections in Apache Hop support multiple authentication
methods. The most common and recommended approach is using Service Account
authentication with a JSON key file.
+
+In the *Options* tab of the connection dialog, configure the following
parameters:
+
+[cols="3*",options="header"]
+|===
+| Parameter | Value | Description
+| OAuthType | 0 | Service Account authentication
+| ProjectId | your-project-id | Your Google Cloud Project ID
+| OAuthServiceAcctEmail | [email protected]
| Service account email address
+| OAuthPvtKeyPath | /path/to/service-account-key.json | Path to the service
account JSON key file
+| TimeOut | 3600 | Connection timeout in seconds (optional)
+|===
+
+=== Example Configuration
+
+Here's a complete example of a working BigQuery connection configuration:
+
+* **Connection name**: BQ
+* **Server host name**: https://www.googleapis.com/bigquery/v2
+* **Database name**: your-project-id
+* **Port number**: 443
+
+**Options tab parameters**:
+
+* **OAuthType**: 0
+* **ProjectId**: your-project-id
+* **OAuthServiceAcctEmail**:
[email protected]
+* **OAuthPvtKeyPath**: /path/to/your-service-account-key.json
+* **TimeOut**: 3600
+
+include::../../snippets/gcp-service-account-setup.adoc[]
+
+== Testing the Connection
+
+After configuring your connection:
+
+1. Click the "Test" button in the connection dialog
+2. If successful, you should see a confirmation message
+3. If the test fails, verify:
+ - Your service account has the necessary BigQuery permissions
+ - The JSON key file path is correct and accessible
+ - Your project ID matches the one in the service account
+ - Network connectivity to Google APIs is available
+
+== Troubleshooting
+
+=== Common Issues
+
+* **Authentication errors**: Verify that your service account has the required
BigQuery roles and that the JSON key file path is correct
+* **Project not found**: Ensure the ProjectId parameter matches your actual
Google Cloud Project ID
+* **Connection timeout**: Increase the TimeOut value if you're experiencing
slow connections
+* **Driver conflicts**: Ensure you've excluded the conflicting GRPC jars as
listed in the driver installation section
+
+=== Performance Considerations
+
+* Use appropriate timeouts for large query operations
+* Consider using BigQuery's standard SQL dialect for better performance
+* Implement proper error handling in your workflows when working with large
datasets
diff --git
a/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
b/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
new file mode 100644
index 0000000000..f8774bf631
--- /dev/null
+++
b/docs/hop-user-manual/modules/ROOT/pages/snippets/gcp-service-account-setup.adoc
@@ -0,0 +1,74 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+////
+
+== Google Cloud Platform Service Account Setup
+
+To connect to Google Cloud services, you need to set up a service account in
your Google Cloud Platform (GCP) project and download the authentication
credentials.
+
+=== Creating a Service Account
+
+1. **Navigate to the Google Cloud Console**:
+ - Go to https://console.cloud.google.com/
+ - Select your project or create a new one
+
+2. **Create a Service Account**:
+ - Navigate to "IAM & Admin" > "Service Accounts"
+ - Click "Create Service Account"
+ - Provide a name and description for your service account
+ - Click "Create and Continue"
+
+3. **Assign Roles**:
+ - Assign the appropriate roles to your service account based on the service
you're connecting to:
+ * For BigQuery:
+ ** `BigQuery Data Viewer` (`roles/bigquery.dataViewer`) - for read
access
+ ** `BigQuery Job User` (`roles/bigquery.jobUser`) - for running queries
(required)
+ ** `BigQuery Data Editor` (`roles/bigquery.dataEditor`) - if write
access needed
+ ** `BigQuery Data Owner` (`roles/bigquery.dataOwner`) - if full control
needed
+ * For Cloud Storage: `Storage Object Viewer`, `Storage Object Admin` (if
write access needed)
+ * For other services: Consult the specific service documentation for
required roles
+ - Click "Continue" and then "Done"
+
+4. **Generate and Download Key**:
+ - Click on the created service account
+ - Go to the "Keys" tab
+ - Click "Add Key" > "Create new key"
+ - Select "JSON" format
+ - Click "Create" to download the key file
+
+5. **Secure the Key File**:
+ - Store the downloaded JSON key file in a secure location
+ - Note the full path to this file - you'll need it for authentication
configuration
+ - Ensure the file has appropriate permissions (readable only by the user
running Hop)
+
+=== Alternative: Using Application Default Credentials
+
+Apache Hop can also use Google Cloud's Application Default Credentials (ADC)
if you're running Hop on Google Cloud Platform or have configured the Google
Cloud SDK locally.
+
+To use ADC:
+1. Install and configure the Google Cloud SDK
+2. Run `gcloud auth application-default login` to set up default credentials
+3. In your Hop connection, you can omit service account key file parameters
+
+This method is particularly useful for development environments or when
running Hop on Google Cloud Platform services.
+
+=== Security Best Practices
+
+* **Principle of Least Privilege**: Only assign the minimum roles necessary
for your use case
+* **Key Rotation**: Regularly rotate service account keys (recommended every
90 days)
+* **Environment Variables**: Consider using environment variables to store key
file paths instead of hardcoding them
+* **Access Control**: Restrict access to service account key files using
appropriate file system permissions
+* **Monitoring**: Enable audit logging to monitor service account usage