This is an automated email from the ASF dual-hosted git repository.
yuqi4733 pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/branch-0.6 by this push:
new 8a9df9030 [#4767] fix(docs): Update the document of playground (#4774)
8a9df9030 is described below
commit 8a9df9030edb04319ac8a439e113747d81e3e058
Author: github-actions[bot]
<41898282+github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Aug 29 16:35:03 2024 +0800
[#4767] fix(docs): Update the document of playground (#4774)
### What changes were proposed in this pull request?
Update the document of playground
### Why are the changes needed?
Fix: #4767
Just update the document according to the latest playground document.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Just documents.
Co-authored-by: roryqi <[email protected]>
---
docs/how-to-use-the-playground.md | 80 +++++++++++++++++++++++++++++++--------
1 file changed, 64 insertions(+), 16 deletions(-)
diff --git a/docs/how-to-use-the-playground.md
b/docs/how-to-use-the-playground.md
index 1b092d40f..f1a49bd9f 100644
--- a/docs/how-to-use-the-playground.md
+++ b/docs/how-to-use-the-playground.md
@@ -19,14 +19,14 @@ Install Git and Docker Compose.
The playground runs several services. The TCP ports used may clash with
existing services you run, such as MySQL or Postgres.
-| Docker container | Ports used |
-|-----------------------|----------------------|
-| playground-gravitino | 8090 9001 |
-| playground-hive | 3307 9003 9084 50071 |
-| playground-mysql | 13306 |
-| playground-postgresql | 15342 |
-| playground-trino | 18080 |
-| playground-jupyter | 18888 |
+| Docker container | Ports used |
+|-----------------------|------------------------|
+| playground-gravitino | 8090 9001 |
+| playground-hive | 3307 19000 19083 60070 |
+| playground-mysql | 13306 |
+| playground-postgresql | 15342 |
+| playground-trino | 18080 |
+| playground-jupyter | 18888 |
## Start playground
@@ -38,8 +38,8 @@ cd gravitino-playground
./launch-playground.sh
```
-### Launching a component of the playground
+### Launch special component or components of playground
```shell
git clone [email protected]:apache/gravitino-playground.git
cd gravitino-playground
@@ -52,7 +52,7 @@ Note. Components have dependencies, so not launching all
components may prevent
### Using Trino CLI in Docker Container
-1. Log in to the Gravitino playground Trino Docker container using the
following command:
+1. Login to the Gravitino playground Trino Docker container using the
following command:
```shell
docker exec -it playground-trino bash
@@ -64,7 +64,7 @@ docker exec -it playground-trino bash
trino@container_id:/$ trino
```
-### Using Jupyter Notebook
+## Using Jupyter Notebook
1. Open the Jupyter Notebook in the browser at
[http://localhost:18888](http://localhost:18888).
@@ -72,9 +72,23 @@ trino@container_id:/$ trino
3. Start the notebook and run the cells.
+## Using Spark client
+
+1. Login to the Gravitino playground Spark Docker container using the
following command:
+
+```shell
+docker exec -it playground-spark bash
+````
+
+2. Open the Spark SQL client in the container.
+
+```shell
+spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql
+```
+
## Example
-### Simple queries
+### Simple Trino queries
You can use simple queries to test in the Trino CLI.
@@ -145,6 +159,38 @@ WHERE e.employee_id = p.employee_id AND p.employee_id =
s.employee_id
GROUP BY e.employee_id, given_name, family_name;
```
+### Using Spark and Trino
+
+You might consider generating data with SparkSQL and then querying this data
using Trino. Give it a try with Gravitino:
+
+1. Login Spark container and execute the SQLs:
+
+```sql
+// using Hive catalog to create Hive table
+USE catalog_hive;
+CREATE DATABASE product;
+USE product;
+
+CREATE TABLE IF NOT EXISTS employees (
+ id INT,
+ name STRING,
+ age INT
+)
+PARTITIONED BY (department STRING)
+STORED AS PARQUET;
+DESC TABLE EXTENDED employees;
+
+INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES
(1, 'John Doe', 30), (2, 'Jane Smith', 28);
+INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3,
'Mike Brown', 32);
+```
+
+2. Login Trino container and execute SQLs:
+
+```sql
+SELECT * FROM catalog_hive.product.employees WHERE department = 'Engineering';
+```
+
+
### Using Apache Iceberg REST service
Suppose you want to migrate your business from Hive to Iceberg. Some tables
will use Hive, and the other tables will use Iceberg.
@@ -155,12 +201,14 @@ Then, you can use Trino to read the data from the Hive
table and join it with th
```text
spark.sql.extensions
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
-spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog
-spark.sql.catalog.catalog_iceberg.type rest
-spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/
+spark.sql.catalog.catalog_rest org.apache.iceberg.spark.SparkCatalog
+spark.sql.catalog.catalog_rest.type rest
+spark.sql.catalog.catalog_rest.uri http://gravitino:9001/iceberg/
spark.locality.wait.node 0
```
+Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino
and Trino share the same Iceberg JDBC backend, which implies that they can
access the same dataset.
+
1. Login Spark container and execute the steps.
```shell
@@ -172,7 +220,7 @@ spark@container_id:/$ cd /opt/spark && /bin/bash
bin/spark-sql
```
```SQL
-use catalog_iceberg;
+use catalog_rest;
create database sales;
use sales;
create table customers (customer_id int, customer_name varchar(100),
customer_email varchar(100));