(gravitino) branch branch-0.6 updated: [#4767] fix(docs): Update the document of playground (#4774)

yuqi4733 Thu, 29 Aug 2024 01:37:20 -0700

This is an automated email from the ASF dual-hosted git repository.

yuqi4733 pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/branch-0.6 by this push:
     new 8a9df9030 [#4767] fix(docs): Update the document of playground (#4774)
8a9df9030 is described below

commit 8a9df9030edb04319ac8a439e113747d81e3e058
Author: github-actions[bot] 
<41898282+github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Aug 29 16:35:03 2024 +0800

    [#4767] fix(docs): Update the document of playground (#4774)
    
    ### What changes were proposed in this pull request?
    
    Update the document of playground
    
    ### Why are the changes needed?
    
    Fix: #4767
    Just update the document according to the latest playground document.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Just documents.
    
    Co-authored-by: roryqi <[email protected]>
---
 docs/how-to-use-the-playground.md | 80 +++++++++++++++++++++++++++++++--------
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/docs/how-to-use-the-playground.md 
b/docs/how-to-use-the-playground.md
index 1b092d40f..f1a49bd9f 100644
--- a/docs/how-to-use-the-playground.md
+++ b/docs/how-to-use-the-playground.md
@@ -19,14 +19,14 @@ Install Git and Docker Compose.
 
 The playground runs several services. The TCP ports used may clash with 
existing services you run, such as MySQL or Postgres.
 
-| Docker container      | Ports used           |
-|-----------------------|----------------------|
-| playground-gravitino  | 8090 9001            |
-| playground-hive       | 3307 9003 9084 50071 |
-| playground-mysql      | 13306                |
-| playground-postgresql | 15342                |
-| playground-trino      | 18080                |
-| playground-jupyter    | 18888                |
+| Docker container      | Ports used             |
+|-----------------------|------------------------|
+| playground-gravitino  | 8090 9001              |
+| playground-hive       | 3307 19000 19083 60070 |
+| playground-mysql      | 13306                  |
+| playground-postgresql | 15342                  |
+| playground-trino      | 18080                  |
+| playground-jupyter    | 18888                  |
 
 ## Start playground
 
@@ -38,8 +38,8 @@ cd gravitino-playground
 ./launch-playground.sh
 ```
 
-### Launching a component of the playground
 
+### Launch special component or components of playground
 ```shell
 git clone [email protected]:apache/gravitino-playground.git
 cd gravitino-playground
@@ -52,7 +52,7 @@ Note. Components have dependencies, so not launching all 
components may prevent
 
 ### Using Trino CLI in Docker Container
 
-1. Log in to the Gravitino playground Trino Docker container using the 
following command:
+1. Login to the Gravitino playground Trino Docker container using the 
following command:
 
 ```shell
 docker exec -it playground-trino bash
@@ -64,7 +64,7 @@ docker exec -it playground-trino bash
 trino@container_id:/$ trino
 ```
 
-### Using Jupyter Notebook
+## Using Jupyter Notebook
 
 1. Open the Jupyter Notebook in the browser at 
[http://localhost:18888](http://localhost:18888).
 
@@ -72,9 +72,23 @@ trino@container_id:/$ trino
 
 3. Start the notebook and run the cells.
 
+## Using Spark client
+
+1. Login to the Gravitino playground Spark Docker container using the 
following command:
+
+```shell
+docker exec -it playground-spark bash
+````
+
+2. Open the Spark SQL client in the container.
+
+```shell
+spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql 
+```
+
 ## Example
 
-### Simple queries
+### Simple Trino queries
 
 You can use simple queries to test in the Trino CLI.
 
@@ -145,6 +159,38 @@ WHERE e.employee_id = p.employee_id AND p.employee_id = 
s.employee_id
 GROUP BY e.employee_id,  given_name, family_name;
 ```
 
+### Using Spark and Trino
+
+You might consider generating data with SparkSQL and then querying this data 
using Trino. Give it a try with Gravitino:
+
+1. Login Spark container and execute the SQLs:
+
+```sql
+// using Hive catalog to create Hive table
+USE catalog_hive;
+CREATE DATABASE product;
+USE product;
+
+CREATE TABLE IF NOT EXISTS employees (
+    id INT,
+    name STRING,
+    age INT
+)
+PARTITIONED BY (department STRING)
+STORED AS PARQUET;
+DESC TABLE EXTENDED employees;
+
+INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES 
(1, 'John Doe', 30), (2, 'Jane Smith', 28);
+INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 
'Mike Brown', 32);
+```
+
+2. Login Trino container and execute SQLs:
+
+```sql
+SELECT * FROM catalog_hive.product.employees WHERE department = 'Engineering';
+```
+
+
 ### Using Apache Iceberg REST service
 
 Suppose you want to migrate your business from Hive to Iceberg. Some tables 
will use Hive, and the other tables will use Iceberg.
@@ -155,12 +201,14 @@ Then, you can use Trino to read the data from the Hive 
table and join it with th
 
 ```text
 spark.sql.extensions 
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
-spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog
-spark.sql.catalog.catalog_iceberg.type rest
-spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/
+spark.sql.catalog.catalog_rest org.apache.iceberg.spark.SparkCatalog
+spark.sql.catalog.catalog_rest.type rest
+spark.sql.catalog.catalog_rest.uri http://gravitino:9001/iceberg/
 spark.locality.wait.node 0
 ```
 
+Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino 
and Trino share the same Iceberg JDBC backend, which implies that they can 
access the same dataset.
+
 1. Login Spark container and execute the steps.
 
 ```shell
@@ -172,7 +220,7 @@ spark@container_id:/$ cd /opt/spark && /bin/bash 
bin/spark-sql
 ```
 
 ```SQL
-use catalog_iceberg;
+use catalog_rest;
 create database sales;
 use sales;
 create table customers (customer_id int, customer_name varchar(100), 
customer_email varchar(100));

(gravitino) branch branch-0.6 updated: [#4767] fix(docs): Update the document of playground (#4774)

Reply via email to