This is an automated email from the ASF dual-hosted git repository.
jmclean pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new b2b2338d52 [doc] Revise the glossary documentation (#5837)
b2b2338d52 is described below
commit b2b2338d52003eceaa2e8ee959b73baf5c32c72a
Author: Qiming Teng <[email protected]>
AuthorDate: Mon Jan 13 13:54:20 2025 +0800
[doc] Revise the glossary documentation (#5837)
### What changes were proposed in this pull request?
This PR fixes the glossary docs.
### Why are the changes needed?
The glossary is reordered for quick reference.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A
---
docs/glossary.md | 384 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 226 insertions(+), 158 deletions(-)
diff --git a/docs/glossary.md b/docs/glossary.md
index 83e97d915a..3b42a6c773 100644
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -4,41 +4,180 @@ date: 2023-11-28
license: "This software is licensed under the Apache License version 2."
---
+## API
+
+- Application Programming Interface, defining the methods and protocols for
interacting with a server.
+
+## AWS
+
+- Amazon Web Services, a cloud computing platform provided by Amazon.
+
+## AWS Glue
+
+- A compatible implementation of the Hive Metastore Service (HMS).
+
+## GPG/GnuPG
+
+- Gnu Privacy Guard or GnuPG is an open-source implementation of the OpenPGP
standard.
+ It is usually used for encrypting and signing files and emails.
+
+## HDFS
+
+- **HDFS** (Hadoop Distributed File System) is an open-source distributed file
system.
+ It is a key component of the Apache Hadoop ecosystem.
+ HDFS is designed as a distributed storage solution to store and process
large-scale datasets.
+ It features high reliability, fault tolerance, and excellent performance.
+
+## HTTP port
+
+- The port number on which a server listens for incoming connections.
+
+## IP address
+
+- Internet Protocol address, a numerical label assigned to each device in a
computer network.
+
+## JDBC
+
+- Java Database Connectivity, an API for connecting Java applications to
relational databases.
+
+## JDBC URI
+
+- The JDBC connection address specified in the catalog configuration.
+ It usually includes components such as the database type, host, port, and
database name.
+
+## JDK
+
+- The software development kit for the Java programming language.
+ A JDK provides tools for compiling, debugging, and running Java applications.
+
+## JMX
+
+- Java Management Extensions provides tools for managing and monitoring Java
applications.
+
+## JSON
+
+- JavaScript Object Notation, a lightweight data interchange format.
+
+## JSON Web Token
+
+- See [JWT](#jwt).
+
+## JVM
+
+- A virtual machine that enables a computer to run Java applications.
+ A JVM implements an abstract machine that is different from the underlying
hardware.
+
+## JVM instrumentation
+
+- The process of adding monitoring and management capabilities to the
[JVM](#jvm).
+ The purpose of instrumentation is mainly for the collection of performance
metrics.
+
+## JVM metrics
+
+- Metrics related to the performance and behavior of the [Java Virtual
Machine](#jvm).
+ Some valuable metrics are memory usage, garbage collection, and buffer pool
metrics.
+
+## JWT
+
+- A compact, URL-safe representation for claims between two parties.
+
+## KEYS file
+
+- A file containing public keys used to sign previous releases, necessary for
verifying signatures.
+
+## PGP signature
+
+- A digital signature generated using the Pretty Good Privacy (PGP) algorithm.
+ The signature is typically used to validate the authenticity of a file.
+
+## REST
+
+- A set of architectural principles for designing networked applications.
+
+## REST API
+
+- Representational State Transfer (REST) Application Programming Interface.
+ A set of rules and conventions for building and interacting with Web
services using standard HTTP methods.
+
+## SHA256 checksum
+
+- A cryptographic hash function used to verify the integrity of files.
+
+## SHA256 checksum file
+
+- A file containing the SHA256 hash value of another file, used for
verification purposes.
+
+## SQL
+
+- A programming language used to manage and manipulate relational databases.
+
+## SSH
+
+- Secure Shell, a cryptographic network protocol used for secure communication
over a computer network.
+
+## URI
+
+- Uniform Resource Identifier, a string that identifies the name or resource
on the internet.
+
+## YAML
+
+- YAML Ain't Markup Language, a human-readable file format often used for
structured data.
+
+## Amazon Elastic Block Store (EBS)
+
+- A scalable block storage service provided by Amazon Web Services (AWS).
+
+## Apache Gravitino
+
+- An open-source software platform initially created by Datastrato.
+ It is designed for high-performance, geo-distributed, and federated metadata
lakes.
+ Gravitino can manage metadata directly in different sources, types, and
regions,
+ providing data and AI assets with unified metadata access.
+
+## Apache Gravitino configuration file (gravitino.conf)
+
+- The configuration file for the Gravitino server, located in the `conf`
directory.
+ It follows the standard properties file format and contains settings for the
Gravitino server.
+
## Apache Hadoop
- An open-source distributed storage and processing framework.
## Apache Hive
-- An open-source data warehousing and SQL-like query language software project
for managing and querying large datasets.
+- An open-source data warehousing software project.
+ It provides SQL-like query language for managing and querying large datasets.
## Apache Iceberg
- An open-source, versioned table format for large-scale data processing.
-## Apache License version 2
+## Apache Iceberg Hive catalog
-- A permissive, open-source software license written by The Apache Software
Foundation.
+- The **Iceberg Hive catalog** is a metadata service designed for the Apache
Iceberg table format.
+ It allows external systems to interact with an Iceberg metadata using a Hive
metastore thrift client.
-## API
+## Apache Iceberg JDBC catalog
-- Application Programming Interface, defining the methods and protocols for
interacting with a server.
+- The **Iceberg JDBC catalog** is a metadata service designed for the Apache
Iceberg table format.
+ It enables external systems to interact with an Iceberg metadata service
using [JDBC](#jdbc).
-## Authentication mechanism
+## Apache Iceberg REST catalog
-- The method used to verify the identity of users and clients accessing a
server.
+- The **Iceberg REST Catalog** is a metadata service designed for the Apache
Iceberg table format.
+ It enables external systems to interact with Iceberg metadata service using
a [REST API](#rest-api).
-## AWS
+## Apache License version 2
-- Amazon Web Services, a cloud computing platform provided by Amazon.
+- A permissive, open-source software license written by The Apache Software
Foundation.
-## AWS Glue
+## Authentication mechanism
-- A compatible implementation of the Hive Metastore Service (HMS).
+- The method used to verify the identity of users and clients accessing a
server.
## Binary distribution package
-- A package containing the compiled and executable version of the software,
ready for distribution and deployment.
+- A software package containing the compiled executables for distribution and
deployment.
## Catalog
@@ -50,15 +189,12 @@ license: "This software is licensed under the Apache
License version 2."
## Columns
-- The individual fields or attributes of a table, specifying details such as
name, data type, comment, and nullability.
+- The individual fields or attributes of a table.
+ Each column has properties like name, data type, comment, and nullability.
## Continuous integration (CI)
-- The practice of automatically building, testing, and validating code changes
when they are committed to version control.
-
-## Contributor covenant
-
-- A widely-used and recognized code of conduct for open-source communities. It
provides guidelines for creating a welcoming and inclusive environment for all
contributors.
+- The practice of automatically building and testing code changes when they
are committed to version control.
## Dependencies
@@ -74,51 +210,56 @@ license: "This software is licensed under the Apache
License version 2."
## Docker container
-- A lightweight, standalone, executable package that includes everything
needed to run a piece of software, including the code, runtime, libraries, and
system tools.
+- A lightweight, standalone package that includes everything needed to run the
software.
+ A container compiles an application with its dependencies and runtime for
distribution.
## Docker Hub
-- A cloud-based registry service for Docker containers, allowing users to
share and distribute containerized applications.
+- A cloud-based registry service for Docker containers.
+ Users can publish, browse and download containerized software using this
service.
## Docker image
-- A lightweight, standalone, and executable package that includes everything
needed to run a piece of software, including the code, runtime, libraries, and
system tools.
+- A lightweight, standalone package that includes everything needed to run the
software.
+ A Docker image typically comprises the code, runtime, libraries, and system
tools.
-## Docker file
+## Dockerfile
-- A configuration file used to create a Docker image, specifying the base
image, dependencies, and commands for building the image.
+- A configuration file for building a Docker image.
+ A Dockerfile contains instructions to build a standard image for
distributing the software.
-## Dropwizard Metrics
+## Dropwizard metrics
- A Java library for measuring the performance of applications and providing
support for various metric types.
-## Amazon Elastic Block Store (EBS)
-
-- A scalable block storage service provided by Amazon Web Services.
-
## Environment variables
-- Variables used to pass information to running processes.
+- Variables used to customize the runtime configuration for a process.
## Geo-distributed
- The distribution of data or services across multiple geographic locations.
+## Git
+
+- A distributed version control system used for tracking software artifacts.
+
## GitHub
-- A web-based platform for version control and collaboration using Git.
+- A web-based platform for version control and community collaboration using
Git.
## GitHub Actions
-- A continuous integration and continuous deployment (CI/CD) service provided
by GitHub, used for automating build, test, and deployment workflows.
+- A continuous integration and continuous deployment (CI/CD) service provided
by GitHub.
+ GitHub Actions automate the build, test, and deployment workflows.
## GitHub labels
-- Tags assigned to GitHub issues or pull requests for organization,
categorization, or workflow automation.
+- Labels assigned to GitHub issues or pull requests for organization or
workflow automation.
## GitHub pull request
-- A proposed change to a repository submitted by a user through the GitHub
platform.
+- A proposed change to a GitHub repository submitted by a user.
## GitHub repository
@@ -126,127 +267,67 @@ license: "This software is licensed under the Apache
License version 2."
## GitHub workflow
-- A series of automated steps defined in a YAML file that runs in response to
events on a GitHub repository.
-
-## Git
-
-- A version control system used for tracking changes and collaborating on
source code.
-
-## GPG/GnuPG
-
-- Gnu Privacy Guard or GnuPG, an open-source implementation of the OpenPGP
standard, used for encrypting and signing files and emails.
+- A series of automated steps triggered by specific events on a GitHub
repository.
## Gradle
-- A build automation tool for building, testing, and deploying projects.
+- An automation tool for building, testing, and deploying projects.
## Gradlew
-- A Gradle wrapper script, used for executing Gradle commands without
installing Gradle separately.
-
-## Apache Gravitino
-
-- An open-source software platform originally created by Datastrato for
high-performance, geo-distributed, and federated metadata lakes. Designed to
manage metadata directly in different sources, types, and regions, providing
unified metadata access for data and AI assets.
-
-## Apache Gravitino configuration file (gravitino.conf)
-
-- The configuration file for the Gravitino server, located in the `conf`
directory. It follows the standard property file format and contains settings
for the Gravitino server.
+- A Gradle wrapper script used to execute Gradle commands.
## Hashes
-- Cryptographic hash values generated from the contents of a file, often used
for integrity verification.
-
-## HDFS
-
-- **HDFS** (Hadoop Distributed File System) is an open-source, distributed
file system and a key component of the Apache Hadoop ecosystem. It is designed
to store and process large-scale datasets, providing high reliability, fault
tolerance, and performance for distributed storage solutions.
+- Cryptographic hash values generated from some data.
+ A typical use case is to verify the integrity of a file.
## Headless
-- A system without a graphical user interface.
-
-## HTTP port
-
-- The port number on which a server listens for incoming connections.
-
-## Apache Iceberg Hive catalog
-
-- The **Iceberg Hive catalog** is a specialized metadata service designed for
the Apache Iceberg table format, allowing external systems to interact with
Iceberg metadata via a Hive metastore thrift client.
-
-## Apache Iceberg REST catalog
-
-- The **Iceberg REST Catalog** is a specialized metadata service designed for
the Apache Iceberg table format, allowing external systems to interact with
Iceberg metadata via a RESTful API.
-
-## Apache Iceberg JDBC catalog
-
-- The **Iceberg JDBC Catalog** is a specialized metadata service designed for
the Apache Iceberg table format, allowing external systems to interact with
Iceberg metadata using JDBC (Java Database Connectivity).
+- A system without a local console.
## Identity fields
-- Fields in tables that define the identity of the table, specifying how rows
in the table are uniquely identified.
+- Fields in tables that define the identity of the records.
+ In the scope of a table, the identity fields are used as the unique
identifier of a row.
## Integration tests
-- Tests designed to ensure the correctness and compatibility of software when
integrated into a unified system.
-
-## IP address
-
-- Internet Protocol address, a numerical label assigned to each device
participating in a computer network.
+- Tests that ensure software correctness and compatibility when integrating
components into a larger system.
## Java Database Connectivity (JDBC)
-- Java Database Connectivity, an API for connecting Java applications to
relational databases.
+- See [JDBC](#jdbc)
## Java Development Kits (JDKs)
-- Software development kits for the Java programming language, including tools
for compiling, debugging, and running Java applications.
-
-## Java Toolchain
+- See [JDK](#jdk)
-- A feature introduced in Gradle to detect and manage JDK versions.
+## Java Management Extensions
-## JDBC URI
-
-- The JDBC connection address specified in the catalog configuration,
including details such as the database type, host, port, and database name.
-
-## JMX
-
-- Java Management Extensions provides tools for managing and monitoring Java
applications.
-
-## JSON
-
-- JavaScript Object Notation, a lightweight data interchange format.
+- See [JMX](#jmx)
-## JWT(JSON Web Token)
-
-- A compact, URL-safe means of representing claims between two parties.
-
-## Java Virtual Machine (JVM)
-
-- A virtual machine that enables a computer to run Java applications,
providing an abstraction layer between the application and the underlying
hardware.
-
-## JVM metrics
+## Java Toolchain
-- Metrics related to the performance and behavior of the Java Virtual Machine
(JVM), including memory usage, garbage collection, and buffer pool metrics.
+- A Gradle feature for detecting and managing JDK versions.
-## JVM instrumentation
+## Java Virtual Machine
-- The process of adding monitoring and management capabilities to the Java
Virtual Machine, allowing for the collection of performance metrics.
+- See [JVM](#jvm)
## Key pair
- A pair of cryptographic keys, including a public key used for verification
and a private key used for signing.
-## KEYS file
-
-- A file containing public keys used to sign previous releases, necessary for
verifying signatures.
-
## Lakehouse
-- **Lakehouse** refers to a modern data management architecture that combines
elements of data lakes and data warehouses. It aims to provide a unified
platform for storing, managing, and analyzing both raw unstructured data
(similar to data lakes) and curated structured data.
+- **Lakehouse** is a modern data management architecture that combines
elements of data lakes and data warehouses.
+ It aims to provide a unified platform for storing, managing, and analyzing
both raw unstructured data
+ (similar to data lakes) and curated structured data.
## Manifest
-- A list of files and associated metadata that collectively define the
structure and content of a release or distribution.
+- A list of files and their associated metadata that collectively define the
structure and content of a release or distribution.
## Merge operation
@@ -254,7 +335,9 @@ license: "This software is licensed under the Apache
License version 2."
## Metalake
-- The top-level container for metadata. Typically, a metalake is a tenant-like
mapping to an organization or a company. All the catalogs, users, and roles are
under one metalake.
+- The top-level container for metadata.
+ Typically, a metalake is a tenant-like mapping to an organization or a
company.
+ All the catalogs, users, and roles are associated with one metalake.
## Metastore
@@ -264,17 +347,14 @@ license: "This software is licensed under the Apache
License version 2."
- A distinct and separable part of a project.
-## OrbStack
-
-- A tool mentioned as an alternative to Docker for macOS when running
Gravitino integration tests.
-
## Open authorization / OAuth
-- A standard protocol for authorization that allows third-party applications
to access user data without exposing user credentials.
+- A standard protocol for authorization that allows third-party applications
to authenticate a user.
+ The application doesn't need to access the user credentials.
-## PGP Signature
+## OrbStack
-- A digital signature generated using the Pretty Good Privacy (PGP) algorithm,
confirming the authenticity of a file.
+- A tool mentioned as an alternative to Docker for macOS when running
Gravitino integration tests.
## Private key
@@ -282,31 +362,33 @@ license: "This software is licensed under the Apache
License version 2."
## Properties
-- Configurable settings and attributes associated with catalogs, schemas, and
tables, to influence their behavior and storage.
+- Configurable settings and attributes associated with catalogs, schemas, and
tables.
+ The property settings influence the behavior and storage of the
corresponding entities.
## Protocol buffers (protobuf)
-- A method developed by Google for serializing structured data, similar to XML
or JSON. It is often used for efficient and extensible communication between
systems.
+- A method developed by Google for serializing structured data, similar to XML
or JSON.
+ It is often used for efficient and extensible communication between systems.
## Public key
- An openly shared key used for verification, encryption, or other operations
intended for public knowledge.
-## Representational State Transfer (REST)
+## Representational State Transfer
-- A set of architectural principles for designing networked applications.
+- See [REST](#rest)
-## REST API (Representational State Transfer Application Programming Interface)
+## RocksDB
-- A set of rules and conventions for building and interacting with web
services using standard HTTP methods.
+- An open source key-value storage database.
## Schema
- A logical container for organizing tables in a database.
-## Secure Shell (SSH)
+## Secure Shell
-- Secure Shell, a cryptographic network protocol used for secure communication
over a computer network.
+- See [SSH](#ssh)
## Security group
@@ -314,15 +396,8 @@ license: "This software is licensed under the Apache
License version 2."
## Serde
-- A Serialization/Deserialization library responsible for transforming data
between a tabular format and a format suitable for storage or transmission.
-
-## SHA256 checksum
-
-- A cryptographic hash function used to verify the integrity of files.
-
-## SHA256 checksum file
-
-- A file containing the SHA256 hash value of another file, used for
verification purposes.
+- A serialization/deserialization library.
+ It can transform data between a tabular format and a format suitable for
storage or transmission.
## Snapshot
@@ -336,21 +411,22 @@ license: "This software is licensed under the Apache
License version 2."
- A tool or process used to enforce code formatting standards and apply
automatic formatting to code.
-## Structured Query Language (SQL)
+## Structured Query Language
-- A programming language used to manage and manipulate relational databases.
+- See [SQL](#sql)
## Table
- A structured set of data elements stored in columns and rows.
-## Token
+## Thrift
-- A **token** in the context of computing and security commonly refers to a
small, indivisible unit of data. Tokens play a crucial role in various domains,
including authentication, authorization, and cryptographic systems.
+- A network protocol used for communication with Hive Metastore Service (HMS).
-## Thrift protocol
+## Token
-- The network protocol used for communication with Hive Metastore Service
(HMS).
+- A **token** in the context of computing and security is a small, indivisible
unit of data.
+ Tokens play a crucial role in various domains, including authentication and
authorization.
## Trino
@@ -360,30 +436,22 @@ license: "This software is licensed under the Apache
License version 2."
- A connector module for integrating Gravitino with Trino.
-## Trino Apache Gravitino connector documentation
-
-- Documentation providing information on using the Trino connector to access
metadata in Gravitino.
-
## Ubuntu
- A Linux distribution based on Debian, widely used for cloud computing and
servers.
## Unit test
-- A type of testing where individual components or functions of a program are
tested to ensure they work as expected in isolation.
-
-## URI
-
-- Uniform Resource Identifier, a string that identifies the name or resource
on the internet.
+- A type of software testing where individual components or functions of a
program are tested.
+ Unit tests help to ensure that the component or function works as expected
in isolation.
## Verification
-- The process of confirming the authenticity and integrity of a release by
checking its signature and associated hashes.
+- The process of confirming the authenticity and integrity of a release.
+ This is usually done by checking its signature and associated hash values.
-## WEB UI
+## Web UI
- A graphical interface accessible through a web browser.
-## YAML
-- YAML Ain't Markup Language, a human-readable data serialization format often
used for configuration files.