jerryshao commented on code in PR #9173:
URL: https://github.com/apache/gravitino/pull/9173#discussion_r2596907197


##########
docs/lakehouse-generic-catalog.md:
##########
@@ -0,0 +1,587 @@
+---
+title: "Generic Lakehouse Catalog"
+slug: /lakehouse-generic-catalog
+keywords:
+  - lakehouse
+  - lance
+  - metadata
+  - generic catalog
+  - file system
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Generic Lakehouse Catalog is a Gravitino catalog implementation designed 
to seamlessly integrate with lakehouse storage systems built on file 
system-based architectures. This catalog enables unified metadata management 
for lakehouse tables stored on various storage backends, providing a consistent 
interface for data discovery, governance, and access control.
+
+### What is a Lakehouse?
+
+A lakehouse combines the best features of data lakes and data warehouses:
+
+- **Data Lake Benefits**: 
+  - Low-cost storage for massive volumes of raw data
+  - Support for diverse data formats (structured, semi-structured, 
unstructured)
+  - Decoupled storage and compute for flexible scaling
+
+- **Data Warehouse Benefits**:
+  - ACID transactions for data consistency
+  - Schema enforcement and evolution
+  - High-performance analytical queries
+  - Time travel and versioning
+
+### Supported Storage Systems
+
+The catalog works with lakehouse systems built on top of:
+
+**Storage Backends:**
+- **Object Stores:** Amazon S3, Azure Blob Storage, Google Cloud Storage, MinIO
+- **Distributed File Systems:** HDFS, Apache Ozone
+- **Local File Systems:** For development and testing
+
+**Lakehouse Formats:**
+- **Lance** ✅ (We only support Lance format fully at present)
+
+:::info Current Support Status
+While the architecture is designed to support various lakehouse formats, 
Gravitino currently provides **native production support only for Lance-based 
lakehouse systems** with comprehensive testing and optimization.
+:::
+
+### Why Use Generic Lakehouse Catalog?
+
+1. **Unified Metadata Management**: Single source of truth for table metadata 
across multiple storage backends
+2. **Multi-Format Support**: Extensible architecture to support various 
lakehouse table formats
+3. **Storage Flexibility**: Work with any file system - local, HDFS, or cloud 
object stores
+4. **Gravitino Integration**: Leverage Gravitino's access control, lineage 
tracking, and data discovery
+5. **Easy Migration**: Register existing lakehouse tables without data movement
+
+### System Requirements
+
+**Storage Requirements:**
+- Lakehouse storage system must support standard file system operations:
+  - Directory listing and navigation
+  - File reading and writing with atomic operations
+  - File deletion and renaming
+  - Path-based access control (optional but recommended)
+
+**Gravitino Requirements:**
+- Gravitino server version 1.1.0 or later
+- Configured metalake for catalog creation
+- Appropriate permissions for catalog management
+
+**Network Requirements:**
+- Network connectivity between Gravitino server and storage backend
+- For cloud storage: Internet access and valid credentials
+- For HDFS: Proper Hadoop configuration and network access
+
+## Catalog Management
+
+### Capabilities
+
+The Generic Lakehouse Catalog provides comprehensive relational metadata 
management capabilities equivalent to standard relational catalogs:
+
+**Supported Operations:**
+- ✅ Create, read, update, and delete catalogs
+- ✅ List all catalogs in a metalake
+- ✅ Manage catalog properties and metadata
+- ✅ Set and modify catalog locations
+- ✅ Configure storage backend credentials
+
+For detailed information on available operations, see [Manage Relational 
Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md).
+
+### Properties
+
+#### Required Properties
+
+| Property   | Description                                          | Example  
                        | Required |
+|------------|------------------------------------------------------|----------------------------------|----------|
+| `provider` | Catalog provider type                                | 
`lakehouse-generic`              | Yes      |
+| `location` | Root storage path for all schemas and tables         | 
`hdfs://namenode:9000/lakehouse` | False    |
+
+#### Key Property: `location`
+
+The `location` property specifies the root directory for the lakehouse storage 
system. All schemas and tables are stored under this location unless explicitly 
overridden at the schema or table level.
+
+**Location Resolution Hierarchy:**
+1. Table-level `location` (highest priority)
+2. Schema-level `location`
+3. Catalog-level `location` (fallback)
+
+

Review Comment:
   Remove additional blank line here and below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to