This is an automated email from the ASF dual-hosted git repository.
jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new c15d55c [DOC] CarbonExtensions doc
c15d55c is described below
commit c15d55c0aa0630de1a9a4d399a21d61e1a05647f
Author: QiangCai <[email protected]>
AuthorDate: Mon Jan 20 16:25:05 2020 +0800
[DOC] CarbonExtensions doc
Why is this PR needed?
explain how to use CarbonExtensions in spark
What changes were proposed in this PR?
Document is updated to introduce CarbonExtensions
Does this PR introduce any user interface change?
No
Is any new testcase added?
No
This closes #3585
---
docs/quick-start-guide.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index dedba36..f9f467c 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -39,6 +39,7 @@ This tutorial provides a quick introduction to using
CarbonData. To follow along
CarbonData can be integrated with Spark,Presto and Hive execution engines. The
below documentation guides on Installing and Configuring with these execution
engines.
#### Spark
+[Installing and Configuring CarbonData to run locally with Spark SQL CLI
(version:
2.3+)](#installing-and-configuring-carbondata-to-run-locally-with-spark-sql)
[Installing and Configuring CarbonData to run locally with Spark
Shell](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
@@ -65,12 +66,64 @@ CarbonData can be integrated with Spark,Presto and Hive
execution engines. The b
#### Alluxio
[CarbonData supports read and write with Alluxio](./alluxio-guide.md)
+## Installing and Configuring CarbonData to run locally with Spark SQL CLI
(version: 2.3+)
+
+In Spark SQL CLI, it use CarbonExtensions to customize the SparkSession with
CarbonData's parser, analyzer, optimizer and physical planning strategy rules
in Spark.
+To enable CarbonExtensions, we need to add the following configuration.
+
+|Key|Value|
+|---|---|
+|spark.sql.extensions|org.apache.spark.sql.CarbonExtensions|
+
+Start Spark SQL CLI by running the following command in the Spark directory:
+
+```
+./bin/spark-sql --conf
spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars <carbondata
assembly jar path>
+```
+###### Creating a Table
+
+```
+CREATE TABLE IF NOT EXISTS test_table (
+ id string,
+ name string,
+ city string,
+ age Int)
+STORED AS carbondata;
+```
+**NOTE**: CarbonExtensions only support "STORED AS carbondata" and "USING
carbondata"
+
+###### Loading Data to a Table
+
+```
+LOAD DATA INPATH '/path/to/sample.csv' INTO TABLE test_table;
+```
+
+```
+insert into table test_table select '1', 'name1', 'city1', 1;
+```
+
+**NOTE**: Please provide the real file path of `sample.csv` for the above
script.
+If you get "tablestatus.lock" issue, please refer to [FAQ](faq.md)
+
+###### Query Data from a Table
+
+```
+SELECT * FROM test_table;
+```
+
+```
+SELECT city, avg(age), sum(age)
+FROM test_table
+GROUP BY city;
+```
+
## Installing and Configuring CarbonData to run locally with Spark Shell
Apache Spark Shell provides a simple way to learn the API, as well as a
powerful tool to analyze data interactively. Please visit [Apache Spark
Documentation](http://spark.apache.org/docs/latest/) for more details on Spark
shell.
#### Basics
+###### Option 1: Using CarbonSession
Start Spark shell by running the following command in the Spark directory:
```
@@ -99,6 +152,27 @@ val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(
`SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<carbon_store_path>",
"<local metastore path>")`.
- Data storage location can be specified by `<carbon_store_path>`, like
`/carbon/data/store`, `hdfs://localhost:9000/carbon/data/store` or
`s3a://carbon/data/store`.
+###### Option 2: Using SparkSession with CarbonExtensions
+
+Start Spark shell by running the following command in the Spark directory:
+
+```
+./bin/spark-shell --conf
spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars <carbondata
assembly jar path>
+```
+**NOTE**
+ - In this flow, we can use the built-in SparkSession `spark` instead of
`carbon`.
+ We also can create a new SparkSession instead of the built-in SparkSession
`spark` if need.
+ It need to add "org.apache.spark.sql.CarbonExtensions" into spark
configuration "spark.sql.extensions".
+ ```
+ SparkSession newSpark = SparkSession
+ .builder()
+ .config(sc.getConf)
+ .enableHiveSupport
+ .config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions")
+ .getOrCreate()
+ ```
+ - Data storage location can be specified by "spark.sql.warehouse.dir".
+
#### Executing Queries
###### Creating a Table
@@ -114,6 +188,17 @@ carbon.sql(
| STORED AS carbondata
""".stripMargin)
```
+**NOTE**:
+The following table list all supported syntax:
+
+|create table |SparkSession with CarbonExtensions | CarbonSession|
+|---|---|---|
+| STORED AS carbondata|yes|yes|
+| USING carbondata|yes|yes|
+| STORED BY 'carbondata'|no|yes|
+| STORED BY 'org.apache.carbondata.format'|no|yes|
+
+We suggest to use CarbonExtensions instead of CarbonSession.
###### Loading Data to a Table