[carbondata] branch master updated: [DOC] CarbonExtensions doc

jackylk Tue, 21 Jan 2020 22:37:57 -0800

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new c15d55c  [DOC] CarbonExtensions doc
c15d55c is described below

commit c15d55c0aa0630de1a9a4d399a21d61e1a05647f
Author: QiangCai <[email protected]>
AuthorDate: Mon Jan 20 16:25:05 2020 +0800

    [DOC] CarbonExtensions doc
    
    Why is this PR needed?
    explain how to use CarbonExtensions in spark
    
    What changes were proposed in this PR?
    Document is updated to introduce CarbonExtensions
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3585
---
 docs/quick-start-guide.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index dedba36..f9f467c 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -39,6 +39,7 @@ This tutorial provides a quick introduction to using 
CarbonData. To follow along
 CarbonData can be integrated with Spark,Presto and Hive execution engines. The 
below documentation guides on Installing and Configuring with these execution 
engines.
 
 #### Spark
+[Installing and Configuring CarbonData to run locally with Spark SQL CLI 
(version: 
2.3+)](#installing-and-configuring-carbondata-to-run-locally-with-spark-sql)
 
 [Installing and Configuring CarbonData to run locally with Spark 
Shell](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 
@@ -65,12 +66,64 @@ CarbonData can be integrated with Spark,Presto and Hive 
execution engines. The b
 #### Alluxio
 [CarbonData supports read and write with Alluxio](./alluxio-guide.md)
 
+## Installing and Configuring CarbonData to run locally with Spark SQL CLI 
(version: 2.3+)
+
+In Spark SQL CLI, it use CarbonExtensions to customize the SparkSession with 
CarbonData's parser, analyzer, optimizer and physical planning strategy rules 
in Spark.
+To enable CarbonExtensions, we need to add the following configuration.
+
+|Key|Value|
+|---|---|
+|spark.sql.extensions|org.apache.spark.sql.CarbonExtensions| 
+
+Start Spark SQL CLI by running the following command in the Spark directory:
+
+```
+./bin/spark-sql --conf 
spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars <carbondata 
assembly jar path>
+```
+###### Creating a Table
+
+```
+CREATE TABLE IF NOT EXISTS test_table (
+  id string,
+  name string,
+  city string,
+  age Int)
+STORED AS carbondata;
+```
+**NOTE**: CarbonExtensions only support "STORED AS carbondata" and "USING 
carbondata"
+
+###### Loading Data to a Table
+
+```
+LOAD DATA INPATH '/path/to/sample.csv' INTO TABLE test_table;
+```
+
+```
+insert into table test_table select '1', 'name1', 'city1', 1;
+```
+
+**NOTE**: Please provide the real file path of `sample.csv` for the above 
script. 
+If you get "tablestatus.lock" issue, please refer to [FAQ](faq.md)
+
+###### Query Data from a Table
+
+```
+SELECT * FROM test_table;
+```
+
+```
+SELECT city, avg(age), sum(age)
+FROM test_table
+GROUP BY city;
+```
+
 ## Installing and Configuring CarbonData to run locally with Spark Shell
 
 Apache Spark Shell provides a simple way to learn the API, as well as a 
powerful tool to analyze data interactively. Please visit [Apache Spark 
Documentation](http://spark.apache.org/docs/latest/) for more details on Spark 
shell.
 
 #### Basics
 
+###### Option 1: Using CarbonSession
 Start Spark shell by running the following command in the Spark directory:
 
 ```
@@ -99,6 +152,27 @@ val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(
    
`SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<carbon_store_path>",
 "<local metastore path>")`.
  - Data storage location can be specified by `<carbon_store_path>`, like 
`/carbon/data/store`, `hdfs://localhost:9000/carbon/data/store` or 
`s3a://carbon/data/store`.
 
+###### Option 2: Using SparkSession with CarbonExtensions
+
+Start Spark shell by running the following command in the Spark directory:
+
+```
+./bin/spark-shell --conf 
spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars <carbondata 
assembly jar path>
+```
+**NOTE** 
+ - In this flow, we can use the built-in SparkSession `spark` instead of 
`carbon`.
+   We also can create a new SparkSession instead of the built-in SparkSession 
`spark` if need. 
+   It need to add "org.apache.spark.sql.CarbonExtensions" into spark 
configuration "spark.sql.extensions". 
+   ```
+   SparkSession newSpark = SparkSession
+     .builder()
+     .config(sc.getConf)
+     .enableHiveSupport
+     .config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions")
+     .getOrCreate()
+   ```
+ - Data storage location can be specified by "spark.sql.warehouse.dir".
+
 #### Executing Queries
 
 ###### Creating a Table
@@ -114,6 +188,17 @@ carbon.sql(
               | STORED AS carbondata
            """.stripMargin)
 ```
+**NOTE**: 
+The following table list all supported syntax:
+
+|create table |SparkSession with CarbonExtensions | CarbonSession|
+|---|---|---|
+| STORED AS carbondata|yes|yes|
+| USING carbondata|yes|yes|
+| STORED BY 'carbondata'|no|yes|
+| STORED BY 'org.apache.carbondata.format'|no|yes|
+
+We suggest to use CarbonExtensions instead of CarbonSession.
 
 ###### Loading Data to a Table

[carbondata] branch master updated: [DOC] CarbonExtensions doc

Reply via email to