TimothyDing opened a new pull request, #9741:
URL: https://github.com/apache/gravitino/pull/9741
What changes were proposed in this pull request?
This PR adds support for Hologres Catalog in Apache Gravitino, enabling
users to connect to and manage Alibaba Cloud Hologres databases through
Gravitino's unified metadata interface.
Key changes include:
1. New Hologres Catalog Module: Created catalog-jdbc-hologres module with
complete implementation
- HologresCatalog - Main catalog class extending JdbcCatalog
- HologresCatalogOperations - Catalog operations with driver management
- HologresSchemaOperations - Schema operations with Hologres-specific
system schemas filtering
- HologresTableOperations - Table operations supporting TABLE, VIEW, and
FOREIGN TABLE types
- HologresTypeConverter - Type converter for
Hologres/PostgreSQL-compatible types
- HologresExceptionConverter - Exception converter mapping SQLSTATE codes
- HologresColumnDefaultValueConverter - Column default value converter
2. Hologres-Specific Metadata Support: Added comprehensive metadata
extraction from hologres.hg_table_properties
- Distribution key (分布键)
- Clustering key (聚簇键)
- Primary key
- Storage format (ORC/SST)
- Orientation (row/column/mixed)
- Table group
- Lifecycle settings
- Create/DDL timestamps
3. System Schema Filtering: Configured to hide Hologres system schemas
from users:
- hologres_streaming_mv
- hologres_sample
- hg_internal
- hologres_object_table
- hg_recyclebin
4. Frontend Integration: Added Hologres provider configuration to web UI
with column type mappings and table properties
5. JDBC Driver Dependencies: Added PostgreSQL JDBC driver to runtime
dependencies for both Hologres and PostgreSQL catalogs
Why are the changes needed?
Hologres is Alibaba Cloud's real-time OLAP database service that is
compatible with the PostgreSQL protocol. Many enterprises use Hologres as their
primary data warehousing solution, and Gravitino users need the ability to:
1. Unified Metadata Management: Manage Hologres metadata alongside other
data sources (Hive, Iceberg, MySQL, etc.) through a single interface
2. Data Governance: Apply consistent access control, auditing, and
discovery policies across Hologres data
3. Multi-Source Integration: Query Hologres data in conjunction with other
catalog sources through engines like Trino and Spark
4. Schema Discovery: Browse and search Hologres schemas, tables, and views
through Gravitino's web UI
The Hologres Catalog implementation:
- Leverages Hologres' PostgreSQL protocol compatibility
- Extends Gravitino's JDBC catalog framework
- Preserves Hologres-specific metadata (distribution keys, clustering
keys, storage formats)
- Filters out system schemas to provide a clean user experience
- Supports all Hologres table types: regular tables, views, and foreign
tables (MaxCompute external tables)
Fix: #N/A (new feature)
Does this PR introduce any user-facing change?
Yes, this PR introduces several user-facing changes:
1. New Catalog Provider: Users can now create Hologres catalogs via REST
API or Web UI with provider name jdbc-hologres
2. Required Properties: Hologres catalog requires the following
configuration properties:
- jdbc-driver: org.postgresql.Driver (pre-filled default)
- jdbc-url: Hologres connection URL (e.g.,
jdbc:postgresql://{endpoint}:{port}/{database})
- jdbc-user: Database username
- jdbc-password: Database password
- jdbc-database: Hologres database name
3. Supported Column Types: Hologres catalog supports the following column
types:
- binary, boolean, char, date, decimal, double, float, integer, long,
short
- string, time, timestamp, timestamp_tz, varchar
4. Table Properties: Tables loaded from Hologres include Hologres-specific
properties with hologres. prefix:
- hologres.table_id - Unique table identifier
- hologres.storage_format - Storage format (orc/sst)
- hologres.orientation - Storage mode (row/column/mixed)
- hologres.distribution_key - Distribution column(s)
- hologres.clustering_key - Clustering key column(s)
- hologres.primary_key - Primary key column(s)
- hologres.table_group - Table group name
- hologres.lifecycle_in_days - TTL setting
- hologres.create_time - Creation timestamp
- hologres.last_ddl_time - Last DDL timestamp
5. Schema Filtering: The following Hologres system schemas are
automatically filtered out:
- PostgreSQL system schemas: pg_toast, pg_catalog, information_schema
- Hologres system schemas: holo, hologres_streaming_mv, hologres_sample,
hg_internal, hologres_object_table, hg_recyclebin
6. Table Type Support: All three Hologres table types are discoverable:
- Regular tables
- Views
- Foreign tables (for MaxCompute and other external sources)
How was this patch tested?
The implementation was tested with a real Hologres instance:
1. Catalog Creation: Verified catalog creation via REST API
curl -X POST "http://localhost:8090/api/metalakes/{metalake}/catalogs" \
-H "Content-Type: application/json" \
-d '{
"name": "hologres",
"type": "relational",
"provider": "jdbc-hologres",
"properties": {
"jdbc-url": "jdbc:postgresql://{endpoint}:{port}/{database}",
"jdbc-user": "{username}",
"jdbc-password": "{password}",
"jdbc-database": "{database}",
"jdbc-driver": "org.postgresql.Driver"
}
}'
2. Schema Listing: Verified that only user schemas are displayed (system
schemas filtered):
- Found: public, foreign_holo (user schemas)
- Hidden: hologres_streaming_mv, hologres_sample, hg_internal,
hologres_object_table, hg_recyclebin (system schemas)
3. Table Listing: Verified listing of all table types:
- Regular tables: tbl_1, holo_test
- Views: holo_view
- Foreign tables: adv_ad_feature, user_profile, etc.
4. Metadata Loading: Verified Hologres-specific properties are correctly
extracted:
- Distribution key: hologres.distribution_key
- Clustering key: hologres.clustering_key
- Storage format: hologres.storage_format
- All properties prefixed with hologres. for clarity
5. Web UI Integration: Verified Hologres appears as a provider option in
the frontend with proper configuration fields
6. JDBC Driver Loading: Fixed and verified PostgreSQL JDBC driver is
available at runtime for both Hologres and PostgreSQL catalogs
Testing Instructions:
To test this PR:
1. Build the project: ./gradlew build -PpythonVersion=3.11
2. Create distribution: ./gradlew compileDistribution
3. Start Gravitino server: ./distribution/package/bin/graditino.sh start
4. Create a Hologres catalog via REST API or Web UI
5. List schemas and verify system schemas are hidden
6. List tables and verify all table types are shown
7. Load a table and verify Hologres metadata properties are present with
hologres. prefix
---
References:
- https://help.aliyun.com/zh/hologres
- https://help.aliyun.com/zh/hologres/developer-reference/system-tables
- https://help.aliyun.com/zh/hologres/user-guide/distribution-key
- https://help.aliyun.com/zh/hologres/user-guide/clustering-key
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]