This is an automated email from the ASF dual-hosted git repository.
ulyssesyou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new 34c58b9 [KYUUBI #1335] Spell issue branch
34c58b9 is described below
commit 34c58b9133dc87cd93756443bf494c44c969f18f
Author: AnybodyHome <[email protected]>
AuthorDate: Fri Nov 5 09:33:32 2021 +0800
[KYUUBI #1335] Spell issue branch
<!--
Thanks for sending a pull request!
Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://kyuubi.readthedocs.io/en/latest/community/contributions.html
2. If the PR is related to an issue in
https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your
PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][KYUUBI #XXXX] Your PR title ...'.
-->
### _Why are the changes needed?_
<!--
Please clarify why the changes are needed. For instance,
1. If you add a feature, you can talk about the use case of it.
2. If you fix a bug, you can clarify why it is a bug.
-->
Spell check and punctuation check.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including
negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run
test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests)
locally before make a pull request
Closes #1335 from zhenjiaguo/spell-issue-branch.
Closes #1335
b4d48192 [AnybodyHome] recover beeline change
85603b6f [AnybodyHome] spell check and punctuation check
Authored-by: AnybodyHome <[email protected]>
Signed-off-by: ulysses-you <[email protected]>
---
docs/overview/architecture.md | 2 +-
docs/quick_start/quick_start.md | 16 +++++-----
docs/quick_start/quick_start_with_datagrip.md | 2 +-
docs/quick_start/quick_start_with_helm.md | 6 ++--
docs/quick_start/quick_start_with_jdbc.md | 2 +-
docs/sql/rules.md | 2 +-
docs/sql/z-order-benchmark.md | 46 +++++++++++++--------------
docs/tools/spark_block_cleaner.md | 10 +++---
8 files changed, 43 insertions(+), 43 deletions(-)
diff --git a/docs/overview/architecture.md b/docs/overview/architecture.md
index 0f1b4c0..7c8d770 100644
--- a/docs/overview/architecture.md
+++ b/docs/overview/architecture.md
@@ -82,7 +82,7 @@ Next, let us share some of the key design concepts of Kyuubi.
Kyuubi implements the [Hive Service
RPC](https://mvnrepository.com/artifact/org.apache.hive/hive-service-rpc/2.3.9)
module,
which provides the same way of accessing data as HiveServer2 and Spark Thrift
Server.
-On the client side,you can build fantastic business reports, BI applications,
or even ETL jobs only via the [Hive
JDBC](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc/2.3.9)
module.
+On the client side, you can build fantastic business reports, BI applications,
or even ETL jobs only via the [Hive
JDBC](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc/2.3.9)
module.
You only need to be familiar with Structured Query Language (SQL) and Java
Database Connectivity (JDBC) to handle massive data.
It helps you focus on the design and implementation of your business system.
diff --git a/docs/quick_start/quick_start.md b/docs/quick_start/quick_start.md
index fa89c62..f3d72dd 100644
--- a/docs/quick_start/quick_start.md
+++ b/docs/quick_start/quick_start.md
@@ -45,7 +45,7 @@ Java | Java<br>Runtime<br>Environment | Required | Java 8/11
| Kyuubi is pre-bui
Spark | Distributed<br>SQL<br>Engine | Required | 3.0.0 and above | By default
Kyuubi binary release is delivered without<br> a Spark tarball.
HDFS | Distributed<br>File<br>System | Optional | referenced<br>by<br>Spark |
Hadoop Distributed File System is a <br>part of Hadoop framework, used to<br>
store and process the datasets.<br> You can interact with any<br>
Spark-compatible versions of HDFS.
Hive | Metastore | Optional | referenced<br>by<br>Spark | Hive Metastore for
Spark SQL to connect
-Zookeeper | Service<br>Discovery | Optional |
Any<br>zookeeper<br>ensemble<br>compatible<br>with<br>curator(2.12.0) | By
default, Kyuubi provides a<br> embeded Zookeeper server inside for<br>
non-production use.
+Zookeeper | Service<br>Discovery | Optional |
Any<br>zookeeper<br>ensemble<br>compatible<br>with<br>curator(2.12.0) | By
default, Kyuubi provides a<br> embedded Zookeeper server inside for<br>
non-production use.
Additionally, if you want to work with other Spark compatible systems or
plugins, you only need to take care of them as using them with regular Spark
applications.
For example, you can run Spark SQL engines created by the Kyuubi on any
cluster manager, including YARN, Kubernetes, Mesos, e.t.c...
@@ -93,14 +93,14 @@ From top to bottom are:
- DISCLAIMER: the disclaimer made by Apache Kyuubi Community as a project
still in ASF Incubator.
- LICENSE: the [APACHE LICENSE, VERSION
2.0](https://www.apache.org/licenses/LICENSE-2.0) we claim to obey.
- RELEASE: the build information of this package.
-- NOTICE: the natice made by Apache Kyuubi Community about its project and
dependencies.
+- NOTICE: the notice made by Apache Kyuubi Community about its project and
dependencies.
- bin: the entry of the Kyuubi server with `kyuubi` as the startup script.
- conf: all the defaults used by Kyuubi Server itself or creating a session
with Spark applications.
- externals
- engines: contains all kinds of SQL engines that we support, e.g. Apache
Spark, Apache Flink(coming soon).
-- licenses: a bunch of licenses included
+- licenses: a bunch of licenses included.
- jars: packages needed by the Kyuubi server.
-- logs: Where the logs of the Kyuubi server locates.
+- logs: where the logs of the Kyuubi server locates.
- pid: stores the PID file of the Kyuubi server instance.
- work: the root of the working directories of all the forked sub-processes,
a.k.a. SQL engines.
@@ -110,7 +110,7 @@ As mentioned above, for a quick start deployment, then only
you need to be sure
### Setup JAVA
-You can either set it system-widely, e.g. in the `.bashrc` file.
+You can either set it system-widely, e.g. in the `.bashrc` file.
```bash
java -version
@@ -123,7 +123,7 @@ Or, `export JAVA_HOME=/path/to/java` in the local os
session.
```bash
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.5.jdk/Contents/Home
- java -version
+java -version
java version "11.0.5" 2019-10-15 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.5+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)
@@ -214,7 +214,7 @@ In this case, the session will create for the user named
'anonymous'.
Kyuubi will create a Spark SQL engine application using
`kyuubi-spark-sql-engine_2.12-<version>.jar`.
It will cost awhile for the application to be ready before fully establishing
the session.
-Otherwise, an existing application will be resued, and the time cost here is
negligible.
+Otherwise, an existing application will be reused, and the time cost here is
negligible.
Similarly, you can create a session for another user(or principal, subject,
and maybe something else you defined), e.g. named `kentyao`,
@@ -317,7 +317,7 @@ Closing: 0: jdbc:hive2://localhost:10009/
Stop Kyuubi by running the following in the `$KYUUBI_HOME` directory:
```bash
-bin/kyuubi.sh stop
+bin/kyuubi stop
```
And then, you will see the KyuubiServer waving goodbye to you.
diff --git a/docs/quick_start/quick_start_with_datagrip.md
b/docs/quick_start/quick_start_with_datagrip.md
index a37e18f..dc54d04 100644
--- a/docs/quick_start/quick_start_with_datagrip.md
+++ b/docs/quick_start/quick_start_with_datagrip.md
@@ -40,7 +40,7 @@ You should first download the missing driver files. Just
click on the link below
### Generic JDBC Connection Settings
After install drivers, you should configure the right host and port which you
can find in kyuubi server log. By default, we use `localhost` and `10009` to
configure.
-Of curse, you can fill other configs.
+Of course, you can fill other configs.
After generic configs, you can use test connection to test.
diff --git a/docs/quick_start/quick_start_with_helm.md
b/docs/quick_start/quick_start_with_helm.md
index cf4c6f2..5545245 100644
--- a/docs/quick_start/quick_start_with_helm.md
+++ b/docs/quick_start/quick_start_with_helm.md
@@ -42,7 +42,7 @@ cretate ns kyuubi
```bash
helm install kyuubi-helm ${KYUUBI_HOME}/docker/helm -n ${namespace_name}
```
-It will print variables and the way to get kyuubi expose ip and port
+It will print variables and the way to get kyuubi expose ip and port.
```bash
NAME: kyuubi-helm
LAST DEPLOYED: Wed Oct 20 15:22:47 2021
@@ -67,7 +67,7 @@ helm uninstall kyuubi-helm -n ${namespace_name}
#### Edit server config
-Modify `values.yaml` under `${KYUUBI_HOME}/docker/helm`
+Modify `values.yaml` under `${KYUUBI_HOME}/docker/helm`:
```yaml
# Kyuubi server numbers
replicaCount: 2
@@ -105,7 +105,7 @@ NAME READY STATUS RESTARTS
AGE
kyuubi-server-585d8944c5-m7j5s 1/1 Running 0 30m
kyuubi-server-32sdsa1245-2d2sj 1/1 Running 0 30m
```
-then, use pod name to get logs
+then, use pod name to get logs:
```bash
kubectl -n ${namespace_name} logs kyuubi-server-585d8944c5-m7j5s
```
diff --git a/docs/quick_start/quick_start_with_jdbc.md
b/docs/quick_start/quick_start_with_jdbc.md
index bc84098..a06f8f0 100644
--- a/docs/quick_start/quick_start_with_jdbc.md
+++ b/docs/quick_start/quick_start_with_jdbc.md
@@ -35,7 +35,7 @@ Add repository to your maven configuration file which may
reside in `$MAVEN_HOME
<name>central maven repo https</name>
<url>https://repo.maven.apache.org/maven2</url>
</repository>
-<repositories>
+</repositories>
```
You can add below dependency to your `pom.xml` file in your application.
diff --git a/docs/sql/rules.md b/docs/sql/rules.md
index ee7b330..052612c 100644
--- a/docs/sql/rules.md
+++ b/docs/sql/rules.md
@@ -24,7 +24,7 @@
# Auxiliary SQL extension for Spark SQL
Kyuubi provides SQL extension out of box. Due to the version compatibility
with Apache Spark, currently we only support Apache Spark branch-3.1 (i.e 3.1.1
and 3.1.2).
-And don't worry, Kyuubi will support the new Apache Spark version in future.
Thanks to the adaptive query execution framework (AQE), Kyuubi can do these
optimization.
+And don't worry, Kyuubi will support the new Apache Spark version in the
future. Thanks to the adaptive query execution framework (AQE), Kyuubi can do
these optimizations.
## What feature does Kyuubi SQL extension provide
- merging small files automatically
diff --git a/docs/sql/z-order-benchmark.md b/docs/sql/z-order-benchmark.md
index 5beb180..d04f630 100644
--- a/docs/sql/z-order-benchmark.md
+++ b/docs/sql/z-order-benchmark.md
@@ -23,25 +23,25 @@
# Z-order Benchmark
-Z-order is a technique that allows you to map multidimensional data to a
single dimension. We did a performance test
+Z-order is a technique that allows you to map multidimensional data to a
single dimension. We did a performance test.
-for this test ,we used aliyun Databricks Delta test case
-https://help.aliyun.com/document_detail/168137.html?spm=a2c4g.11186623.6.563.10d758ccclYtVb
+For this test ,we used aliyun Databricks Delta test case
+https://help.aliyun.com/document_detail/168137.html?spm=a2c4g.11186623.6.563.10d758ccclYtVb.
Prepare data for the three scenarios:
-1. 10 billion data and 2 hundred files(parquet files): for big file(1G)
-2. 10 billion data and 1 thousand files(parquet files): for medium file(200m)
-3. one billion data and 10 hundred files(parquet files): for smaller file(200k)
+1. 10 billion data and 2 hundred files (parquet files): for big file(1G)
+2. 10 billion data and 1 thousand files (parquet files): for medium file(200m)
+3. 1 billion data and 10 thousand files (parquet files): for smaller file(200k)
-test env:
+Test env:
spark-3.1.2
hadoop-2.7.2
-kyubbi-1.4.0
+kyuubi-1.4.0
-test step:
+Test step:
-Step1: create hive tables
+Step1: create hive tables.
```scala
spark.sql(s"drop database if exists $dbName cascade")
@@ -55,8 +55,8 @@ spark.sql(s"create table $connZorder (src_ip string, src_port
int, dst_ip string
spark.sql(s"show tables").show(false)
```
-Step2: prepare data for parquet table with three scenarios
-we use the following code
+Step2: prepare data for parquet table with three scenarios,
+we use the following code.
```scala
def randomIPv4(r: Random) = Seq.fill(4)(r.nextInt(256)).mkString(".")
@@ -67,14 +67,14 @@ def randomConnRecord(r: Random) = ConnRecord(
dst_ip = randomIPv4(r), dst_port = randomPort(r))
```
-Step3: do optimize with z-order only ip and do optimize with order by only ip,
sort column: src_ip, dst_ip and shuffle partition just as file numbers .
+Step3: do optimize with z-order only ip and do optimize with order by only ip,
sort column: src_ip, dst_ip and shuffle partition just as file numbers.
```
INSERT overwrite table conn_order_only_ip select src_ip, src_port, dst_ip,
dst_port from conn_random_parquet order by src_ip, dst_ip;
OPTIMIZE conn_zorder_only_ip ZORDER BY src_ip, dst_ip;
```
-Step4: do optimize with z-order and do optimize with order by , sort column:
src_ip, src_port, dst_ip, dst_port and shuffle partition just as file numbers .
+Step4: do optimize with z-order and do optimize with order by, sort column:
src_ip, src_port, dst_ip, dst_port and shuffle partition just as file numbers.
```
INSERT overwrite table conn_order select src_ip, src_port, dst_ip, dst_port
from conn_random_parquet order by src_ip, src_port, dst_ip, dst_port;
@@ -82,7 +82,7 @@ OPTIMIZE conn_zorder ZORDER BY src_ip, src_port, dst_ip,
dst_port;
```
-The complete code is as follows:
+The complete code is as follows:
```shell
./spark-shell
@@ -191,20 +191,20 @@ select count(*) from conn_zorder where src_ip like '157%'
and dst_ip like '216.%
## Benchmark result
We have done two performance tests: one is to compare the efficiency of
Z-order Optimize and Order by Sort,
-and the other is to query based on the optimized Z-order by data and Random
data
+and the other is to query based on the optimized Z-order by data and Random
data.
### Efficiency of Z-order Optimize and Order-by Sort
-**10 billion data and 1000 files and Query resource:200 core 600G memory**
+**10 billion data and 1000 files and Query resource: 200 core 600G memory**
-z-order by or order by only ip
+Z-order by or order by only ip:
| Table | row count | optimize time |
| ------------------- | -------------- | ------------------ |
| conn_order_only_ip | 10,000,000,000 | 1591.99 s |
| conn_zorder_only_ip | 10,000,000,000 | 8371.405 s |
-z-order by or order by all columns
+Z-order by or order by all columns:
| Table | row count | optimize time |
| ------------------- | -------------- | ------------------ |
@@ -213,9 +213,9 @@ z-order by or order by all columns
### Z-order by benchmark result
-by querying the tables before and after optimization, we find that
+By querying the tables before and after optimization, we find that:
-**10 billion data and 200 files and Query resource:200 core 600G memory**
+**10 billion data and 200 files and Query resource: 200 core 600G memory**
| Table | Average File Size | Scan row count | Average query
time | row count Skipping ratio |
| ------------------- | ----------------- | -------------- |
------------------ | ------------------------ |
@@ -225,7 +225,7 @@ by querying the tables before and after optimization, we
find that
-**10 billion data and 1000 files and Query resource:200 core 600G memory**
+**10 billion data and 1000 files and Query resource: 200 core 600G memory**
| Table | Average File Size | Scan row count | Average query
time | row count Skipping ratio |
| ------------------- | ----------------- | -------------- |
------------------ | ------------------------ |
@@ -235,7 +235,7 @@ by querying the tables before and after optimization, we
find that
-**1 billion data and 10000 files and Query resource:10 core 40G memory**
+**1 billion data and 10000 files and Query resource: 10 core 40G memory**
| Table | Average File Size | Scan row count | Average query
time | row count Skipping ratio |
| ------------------- | ----------------- | -------------- |
------------------ | ------------------------ |
diff --git a/docs/tools/spark_block_cleaner.md
b/docs/tools/spark_block_cleaner.md
index 8514fa9..c391005 100644
--- a/docs/tools/spark_block_cleaner.md
+++ b/docs/tools/spark_block_cleaner.md
@@ -56,16 +56,16 @@ Before you start using Spark Block Cleaner, you should
build its docker images.
In the `KYUUBI_HOME` directory, you can use the following cmd to build docker
image.
```shell
- docker build ./tools/spark-block-cleaner/kubernetes/docker
+docker build ./tools/spark-block-cleaner/kubernetes/docker
```
### Modify spark-block-cleaner.yml
You need to modify the
`${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml`
to fit your current environment.
-In Kyuubi tools, we recommend using `DaemonSet` to start , and we offer
default yaml file in daemonSet way.
+In Kyuubi tools, we recommend using `DaemonSet` to start, and we offer default
yaml file in daemonSet way.
-Base file structure :
+Base file structure:
```yaml
apiVersion
kind
@@ -128,7 +128,7 @@ After you finishing modifying the above, you can use the
following command `kube
Name | Default | unit | Meaning
--- | --- | --- | ---
CACHE_DIRS | /data/data1,/data/data2| | The target dirs in container path
which will clean block files.
-FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files
which current time - last modified time more than the fileExpiredTime.
-DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the
block files which current time - last modified time more than the
deepCleanFileExpiredTime.
+FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files
which current time - last modified time more than the fileExpiredTime.
+DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the
block files which current time - last modified time more than the
deepCleanFileExpiredTime.
FREE_SPACE_THRESHOLD | 60 | % | After first clean, if free Space low than
threshold trigger deep clean.
SCHEDULE_INTERVAL | 3600 | seconds | Cleaner sleep between cleaning.