Copilot commented on code in PR #9228: URL: https://github.com/apache/gravitino/pull/9228#discussion_r2556123143
########## CLAUDE.md: ########## @@ -0,0 +1 @@ +AGENTS.md Review Comment: The CLAUDE.md file contains only "AGENTS.md" as its content, which doesn't provide meaningful guidance. This file should either: 1. Contain actual Claude-specific instructions for AI coding assistance, or 2. Be structured as a proper reference/redirect (e.g., "See AGENTS.md for AI coding guidance") The current single-word content is unclear and unhelpful for users. ########## .github/copilot-instructions.md: ########## @@ -0,0 +1,333 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# GitHub Copilot Code Review Instructions for Apache Gravitino + +This document provides comprehensive guidelines for GitHub Copilot and other AI assistants when reviewing code contributions to the Apache Gravitino project. + +## Core Review Principles + +1. **Apache Foundation Standards**: Ensure all contributions follow Apache Software Foundation guidelines +2. **Backward Compatibility**: Breaking changes must be explicitly justified and documented +3. **Test Coverage**: All code changes must include appropriate tests - no exceptions +4. **Code Quality**: Prioritize maintainability, readability, and performance +5. **Documentation**: Public APIs and complex logic must be well-documented + +## Code Review Checklist + +### 1. License and Legal Compliance + +- [ ] All new files have the Apache License 2.0 header +- [ ] No GPL or other incompatible licenses in dependencies +- [ ] Third-party code is properly attributed +- [ ] LICENSE and NOTICE files updated if dependencies added +- [ ] No copyrighted code without proper permissions + +### 2. Code Quality + +#### Java Code +- [ ] Follows Google Java Style Guide (enforced by Spotless) +- [ ] No wildcard imports +- [ ] Proper use of `@Nullable` annotations +- [ ] Exception handling is appropriate (don't catch generic `Exception`) +- [ ] Resources are properly closed (use try-with-resources) +- [ ] No `System.out.println()` - use proper logging +- [ ] Logging levels are appropriate (DEBUG, INFO, WARN, ERROR) +- [ ] No TODO or FIXME comments without issue references +- [ ] Thread safety is considered and documented where relevant +- [ ] No deprecated APIs unless necessary (document why) + +#### Scala Code +- [ ] Follows Scala style conventions +- [ ] Proper use of immutable collections +- [ ] Pattern matching over if-else where appropriate +- [ ] Avoid `null` - use `Option` instead + +#### Python Code +- [ ] Follows PEP 8 style guidelines +- [ ] Type hints for function parameters and return values +- [ ] Proper docstrings for public methods +- [ ] Use context managers for resource management + +### 3. Testing Requirements + +#### Unit Tests +- [ ] All new methods/functions have unit tests +- [ ] Edge cases and error conditions are tested +- [ ] Tests are independent and can run in any order +- [ ] Test names clearly describe what is being tested +- [ ] Assertions have meaningful failure messages +- [ ] No hard-coded timeouts (or well-justified) +- [ ] Mock external dependencies appropriately + +#### Integration Tests +- [ ] End-to-end scenarios are covered +- [ ] Docker-dependent tests tagged with `@Tag("gravitino-docker-test")` +- [ ] Tests clean up resources (temp files, containers, etc.) +- [ ] Tests work in both embedded and deploy modes +- [ ] Connection pooling and resource limits considered + +#### Test Anti-Patterns to Flag +- [ ] Tests that depend on execution order +- [ ] Tests with `Thread.sleep()` without justification +- [ ] Tests that ignore exceptions +- [ ] Tests without assertions +- [ ] Flaky tests (consider marking as such) + +### 4. API Design + +#### Public APIs +- [ ] Complete Javadoc with `@param`, `@return`, `@throws` +- [ ] Include code examples in Javadoc for complex APIs +- [ ] API is intuitive and follows existing patterns +- [ ] Backward compatible or breaking change is documented +- [ ] Proper use of interfaces vs. abstract classes +- [ ] Builder pattern for complex object construction +- [ ] Use `Into<T>` or `AsRef<T>` patterns where appropriate (for Rust-like flexibility) Review Comment: The comment references "Use `Into<T>` or `AsRef<T>` patterns where appropriate (for Rust-like flexibility)" which seems out of place in a Java/Scala project. These are Rust-specific patterns and don't apply to Gravitino's tech stack. This guideline should either be removed or clarified if there's an actual Java equivalent being referenced. ```suggestion ``` ########## AGENTS.md: ########## @@ -0,0 +1,462 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# AGENTS.md + +This file provides guidance to AI coding agents collaborating on the Apache Gravitino repository. + +## Project Overview + +Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets. + +Gravitino acts as a centralized metadata management layer that: +- Provides unified metadata access across diverse data sources (Hive, MySQL, Iceberg, Kafka, etc.) +- Enables end-to-end data governance with access control, auditing, and discovery +- Supports geo-distributed architectures for multi-region and multi-cloud deployments +- Integrates seamlessly with query engines like Trino and Spark +- Manages both data assets and AI model metadata + +## Project Requirements + +- Always use English in code, documentation, examples, and comments +- Follow Apache Software Foundation guidelines and best practices +- Code should be clean, maintainable, efficient, and well-documented +- All public APIs must have comprehensive documentation +- Maintain backward compatibility - breaking changes require careful consideration +- Write meaningful tests for all features and bug fixes - **code without tests will not be merged** +- Follow the project's coding standards and use Spotless for formatting + +## Architecture + +The project is organized as a Gradle multi-module project with the following key components: + +### Core Modules +- `api/` - Public API definitions and interfaces for Gravitino +- `common/` - Common utilities, types, and shared code across modules +- `core/` - Core Gravitino server implementation and metadata management logic +- `server/` - REST API server implementation and HTTP endpoints +- `server-common/` - Shared server utilities and configurations + +### Client Modules +- `clients/client-java/` - Java client library for Gravitino API +- `clients/client-python/` - Python client library with bindings +- `clients/cli/` - Command-line interface for Gravitino +- `clients/filesystem-hadoop3/` - Hadoop FileSystem implementation +- `clients/filesystem-fuse/` - FUSE filesystem integration (optional) + +### Catalog Implementations +- `catalogs/catalog-common/` - Base classes and utilities for catalog implementations +- `catalogs/catalog-hive/` - Apache Hive metastore catalog +- `catalogs/catalog-lakehouse-iceberg/` - Apache Iceberg catalog +- `catalogs/catalog-lakehouse-paimon/` - Apache Paimon catalog +- `catalogs/catalog-lakehouse-hudi/` - Apache Hudi catalog +- `catalogs/catalog-lakehouse-generic/` - Generic lakehouse catalog +- `catalogs/catalog-jdbc-mysql/` - MySQL catalog +- `catalogs/catalog-jdbc-postgresql/` - PostgreSQL catalog +- `catalogs/catalog-jdbc-doris/` - Apache Doris catalog +- `catalogs/catalog-kafka/` - Apache Kafka catalog +- `catalogs/catalog-fileset/` - Fileset catalog for file-based data +- `catalogs/catalog-model/` - AI model catalog + +### Integration Connectors +- `spark-connector/` - Spark connector for Gravitino (Spark 3.3, 3.4, 3.5) +- `flink-connector/` - Apache Flink connector (Scala 2.12 only) +- `trino-connector/` - Trino connector for federated queries + +### Standalone Services +- `iceberg/iceberg-rest-server/` - Standalone Iceberg REST catalog service +- `lance/lance-rest-server/` - Lance format REST server + +### Authorization +- `authorizations/authorization-common/` - Authorization framework +- `authorizations/authorization-ranger/` - Apache Ranger integration +- `authorizations/authorization-chain/` - Chain-based authorization + +### Other Components +- `web/` - Web UI frontend (Next.js/React) +- `docs/` - Documentation source files +- `integration-test/` - End-to-end integration tests +- `bundles/` - Cloud storage bundles (AWS, GCP, Azure, Aliyun) +- `lineage/` - Data lineage tracking +- `mcp-server/` - Model Context Protocol server for Gravitino + +## Common Development Commands + +### Build System + +Gravitino uses Gradle as its build system. All commands should be run from the project root. + +#### Basic Build Commands + +```bash +# Clean build without tests (fast) +./gradlew clean build -x test + +# Full build with all tests +./gradlew build + +# Build specific module +./gradlew :module-name:build + +# Compile and package distribution +./gradlew compileDistribution + +# Create distribution tarball +./gradlew assembleDistribution +``` + +#### Scala Version Selection + +Gravitino supports Scala 2.12 and 2.13 (default is 2.13): + +```bash +# Build with Scala 2.12 +./gradlew build -PscalaVersion=2.12 + +# Build with Scala 2.13 (default) +./gradlew build -PscalaVersion=2.13 +``` + +#### Python Client Build + +```bash +# Build with Python 3.9 (default) +./gradlew build + +# Build with specific Python version +./gradlew build -PpythonVersion=3.10 +./gradlew build -PpythonVersion=3.11 +./gradlew build -PpythonVersion=3.12 +``` + +#### Connector Builds + +```bash +# Build Spark connector for Spark 3.4 with Scala 2.12 +./gradlew spark-connector:spark-runtime-3.4:build -PscalaVersion=2.12 + +# Build Trino connector +./gradlew assembleTrinoConnector + +# Build Iceberg REST server +./gradlew assembleIcebergRESTServer +``` + +### Code Quality and Formatting + +Gravitino uses Spotless for code formatting: + +```bash +# Check code formatting +./gradlew spotlessCheck + +# Apply code formatting (ALWAYS run before committing) +./gradlew spotlessApply + +# Compile triggers spotless check +./gradlew compileJava +``` + +### Testing + +#### Unit Tests + +```bash +# Run all unit tests (skip integration tests) +./gradlew test -PskipITs + +# Run tests for specific module +./gradlew :module-name:test + +# Run specific test class +./gradlew :module-name:test --tests "com.example.TestClass" + +# Run specific test method +./gradlew :module-name:test --tests "com.example.TestClass.testMethod" +``` + +#### Integration Tests + +Integration tests can run in two modes: `embedded` (default) and `deploy`. + +```bash +# Run integration tests in embedded mode (uses MiniGravitino) +./gradlew test -PskipTests -PtestMode=embedded + +# Run integration tests in deploy mode (requires distribution) +./gradlew compileDistribution +./gradlew test -PskipTests -PtestMode=deploy + +# Enable Docker-based tests +./gradlew test -PskipDockerTests=false +``` + +#### Docker Test Environment + +For macOS users running Docker tests: + +```bash +# Option 1: Use OrbStack (recommended) +# Install from https://orbstack.dev/ + +# Option 2: Use mac-docker-connector +./dev/docker/tools/mac-docker-connector.sh +``` + +### Running Gravitino Server + +```bash +# Start server (after building) +./bin/gravitino.sh start + +# Stop server +./bin/gravitino.sh stop + +# Or from distribution +./distribution/package/bin/gravitino.sh start +``` + +## Key Technical Details + +### Language and JDK Requirements + +- **Build JDK**: Java 17 (required to run Gradle) +- **Runtime JDK**: Java 17 (for server and connectors) +- **Target Compatibility**: Some modules (clients, connectors) target JDK 8 for compatibility +- **Scala**: Supports 2.12 and 2.13 (Flink only supports 2.12) +- **Python**: 3.9, 3.10, 3.11, or 3.12 + +### Build System Details + +- Gradle 8.x with Kotlin DSL +- Gradle Java Toolchain for automatic JDK management +- Error Prone for additional compile-time checks +- Jacoco for code coverage reporting + +### Testing Framework + +- JUnit 5 (JUnit Platform) for all tests +- TestContainers for Docker-based integration tests Review Comment: The tool name should be "Testcontainers" (one word, lowercase 'c') rather than "TestContainers". This is the official spelling of the Testcontainers framework. ```suggestion - Testcontainers for Docker-based integration tests ``` ########## AGENTS.md: ########## @@ -0,0 +1,462 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# AGENTS.md + +This file provides guidance to AI coding agents collaborating on the Apache Gravitino repository. + +## Project Overview + +Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets. + +Gravitino acts as a centralized metadata management layer that: +- Provides unified metadata access across diverse data sources (Hive, MySQL, Iceberg, Kafka, etc.) +- Enables end-to-end data governance with access control, auditing, and discovery +- Supports geo-distributed architectures for multi-region and multi-cloud deployments +- Integrates seamlessly with query engines like Trino and Spark +- Manages both data assets and AI model metadata + +## Project Requirements + +- Always use English in code, documentation, examples, and comments +- Follow Apache Software Foundation guidelines and best practices +- Code should be clean, maintainable, efficient, and well-documented +- All public APIs must have comprehensive documentation +- Maintain backward compatibility - breaking changes require careful consideration +- Write meaningful tests for all features and bug fixes - **code without tests will not be merged** +- Follow the project's coding standards and use Spotless for formatting + +## Architecture + +The project is organized as a Gradle multi-module project with the following key components: + +### Core Modules +- `api/` - Public API definitions and interfaces for Gravitino +- `common/` - Common utilities, types, and shared code across modules +- `core/` - Core Gravitino server implementation and metadata management logic +- `server/` - REST API server implementation and HTTP endpoints +- `server-common/` - Shared server utilities and configurations + +### Client Modules +- `clients/client-java/` - Java client library for Gravitino API +- `clients/client-python/` - Python client library with bindings +- `clients/cli/` - Command-line interface for Gravitino +- `clients/filesystem-hadoop3/` - Hadoop FileSystem implementation +- `clients/filesystem-fuse/` - FUSE filesystem integration (optional) + +### Catalog Implementations +- `catalogs/catalog-common/` - Base classes and utilities for catalog implementations +- `catalogs/catalog-hive/` - Apache Hive metastore catalog +- `catalogs/catalog-lakehouse-iceberg/` - Apache Iceberg catalog +- `catalogs/catalog-lakehouse-paimon/` - Apache Paimon catalog +- `catalogs/catalog-lakehouse-hudi/` - Apache Hudi catalog +- `catalogs/catalog-lakehouse-generic/` - Generic lakehouse catalog +- `catalogs/catalog-jdbc-mysql/` - MySQL catalog +- `catalogs/catalog-jdbc-postgresql/` - PostgreSQL catalog +- `catalogs/catalog-jdbc-doris/` - Apache Doris catalog +- `catalogs/catalog-kafka/` - Apache Kafka catalog +- `catalogs/catalog-fileset/` - Fileset catalog for file-based data +- `catalogs/catalog-model/` - AI model catalog + +### Integration Connectors +- `spark-connector/` - Spark connector for Gravitino (Spark 3.3, 3.4, 3.5) +- `flink-connector/` - Apache Flink connector (Scala 2.12 only) +- `trino-connector/` - Trino connector for federated queries + +### Standalone Services +- `iceberg/iceberg-rest-server/` - Standalone Iceberg REST catalog service +- `lance/lance-rest-server/` - Lance format REST server + +### Authorization +- `authorizations/authorization-common/` - Authorization framework +- `authorizations/authorization-ranger/` - Apache Ranger integration +- `authorizations/authorization-chain/` - Chain-based authorization + +### Other Components +- `web/` - Web UI frontend (Next.js/React) +- `docs/` - Documentation source files +- `integration-test/` - End-to-end integration tests +- `bundles/` - Cloud storage bundles (AWS, GCP, Azure, Aliyun) +- `lineage/` - Data lineage tracking +- `mcp-server/` - Model Context Protocol server for Gravitino + +## Common Development Commands + +### Build System + +Gravitino uses Gradle as its build system. All commands should be run from the project root. + +#### Basic Build Commands + +```bash +# Clean build without tests (fast) +./gradlew clean build -x test + +# Full build with all tests +./gradlew build + +# Build specific module +./gradlew :module-name:build + +# Compile and package distribution +./gradlew compileDistribution + +# Create distribution tarball +./gradlew assembleDistribution +``` + +#### Scala Version Selection + +Gravitino supports Scala 2.12 and 2.13 (default is 2.13): + +```bash +# Build with Scala 2.12 +./gradlew build -PscalaVersion=2.12 + +# Build with Scala 2.13 (default) +./gradlew build -PscalaVersion=2.13 +``` + +#### Python Client Build + +```bash +# Build with Python 3.9 (default) +./gradlew build + +# Build with specific Python version +./gradlew build -PpythonVersion=3.10 +./gradlew build -PpythonVersion=3.11 +./gradlew build -PpythonVersion=3.12 +``` + +#### Connector Builds + +```bash +# Build Spark connector for Spark 3.4 with Scala 2.12 +./gradlew spark-connector:spark-runtime-3.4:build -PscalaVersion=2.12 + +# Build Trino connector +./gradlew assembleTrinoConnector + +# Build Iceberg REST server +./gradlew assembleIcebergRESTServer +``` + +### Code Quality and Formatting + +Gravitino uses Spotless for code formatting: + +```bash +# Check code formatting +./gradlew spotlessCheck + +# Apply code formatting (ALWAYS run before committing) +./gradlew spotlessApply + +# Compile triggers spotless check +./gradlew compileJava +``` + +### Testing + +#### Unit Tests + +```bash +# Run all unit tests (skip integration tests) +./gradlew test -PskipITs + +# Run tests for specific module +./gradlew :module-name:test + +# Run specific test class +./gradlew :module-name:test --tests "com.example.TestClass" + +# Run specific test method +./gradlew :module-name:test --tests "com.example.TestClass.testMethod" +``` + +#### Integration Tests + +Integration tests can run in two modes: `embedded` (default) and `deploy`. + +```bash +# Run integration tests in embedded mode (uses MiniGravitino) +./gradlew test -PskipTests -PtestMode=embedded + +# Run integration tests in deploy mode (requires distribution) +./gradlew compileDistribution +./gradlew test -PskipTests -PtestMode=deploy + +# Enable Docker-based tests +./gradlew test -PskipDockerTests=false +``` + +#### Docker Test Environment + +For macOS users running Docker tests: + +```bash +# Option 1: Use OrbStack (recommended) +# Install from https://orbstack.dev/ + +# Option 2: Use mac-docker-connector +./dev/docker/tools/mac-docker-connector.sh +``` + +### Running Gravitino Server + +```bash +# Start server (after building) +./bin/gravitino.sh start + +# Stop server +./bin/gravitino.sh stop + +# Or from distribution +./distribution/package/bin/gravitino.sh start +``` + +## Key Technical Details + +### Language and JDK Requirements + +- **Build JDK**: Java 17 (required to run Gradle) +- **Runtime JDK**: Java 17 (for server and connectors) +- **Target Compatibility**: Some modules (clients, connectors) target JDK 8 for compatibility +- **Scala**: Supports 2.12 and 2.13 (Flink only supports 2.12) +- **Python**: 3.9, 3.10, 3.11, or 3.12 + +### Build System Details + +- Gradle 8.x with Kotlin DSL +- Gradle Java Toolchain for automatic JDK management +- Error Prone for additional compile-time checks +- Jacoco for code coverage reporting Review Comment: The tool name should be "JaCoCo" (with proper capitalization) rather than "Jacoco". JaCoCo is the official spelling of the Java Code Coverage tool. ```suggestion - JaCoCo for code coverage reporting ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
