This is an automated email from the ASF dual-hosted git repository.
chenliang613 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new 208afe9337 Add new example:Using CarbonData to visualization in
notebook (#4318)
208afe9337 is described below
commit 208afe93370cc5b7055a600958ffcd0b8b11b231
Author: Bo Xu <[email protected]>
AuthorDate: Mon Jun 26 21:36:39 2023 +0800
Add new example:Using CarbonData to visualization in notebook (#4318)
* Add new example:Using CarbonData to visualization in notebook
* Update the example:Using CarbonData in notebook
---
...book.png => using-carbondata-in-notebook-1.png} | Bin
docs/images/using-carbondata-in-notebook-2.png | Bin 0 -> 364452 bytes
docs/images/using-carbondata-in-notebook-3.png | Bin 0 -> 279560 bytes
...sing-carbondata-in-notebook-visualization-0.png | Bin 0 -> 204393 bytes
...sing-carbondata-in-notebook-visualization-1.png | Bin 0 -> 422103 bytes
...sing-carbondata-in-notebook-visualization-2.png | Bin 0 -> 268091 bytes
...sing-carbondata-in-notebook-visualization-3.png | Bin 0 -> 342120 bytes
docs/images/using-carbondata-in-notebook2.png | Bin 350850 -> 0 bytes
docs/notebook/carbondata_notebook.ipynb | 185 +++++++++++++++
.../carbondata_notebook_with_visualization.ipynb | 263 +++++++++++++++++++++
docs/notebook/sample_data_simple.csv | 101 ++++++++
docs/quick-start-guide.md | 3 +-
docs/using-carbondata-in-notebook.md | 7 +-
...ing-carbondata-to-visualization_in-notebook.md} | 12 +-
14 files changed, 564 insertions(+), 7 deletions(-)
diff --git a/docs/images/using-carbondata-in-notebook.png
b/docs/images/using-carbondata-in-notebook-1.png
similarity index 100%
rename from docs/images/using-carbondata-in-notebook.png
rename to docs/images/using-carbondata-in-notebook-1.png
diff --git a/docs/images/using-carbondata-in-notebook-2.png
b/docs/images/using-carbondata-in-notebook-2.png
new file mode 100644
index 0000000000..43e9b1c56d
Binary files /dev/null and b/docs/images/using-carbondata-in-notebook-2.png
differ
diff --git a/docs/images/using-carbondata-in-notebook-3.png
b/docs/images/using-carbondata-in-notebook-3.png
new file mode 100644
index 0000000000..c6cc68a79b
Binary files /dev/null and b/docs/images/using-carbondata-in-notebook-3.png
differ
diff --git a/docs/images/using-carbondata-in-notebook-visualization-0.png
b/docs/images/using-carbondata-in-notebook-visualization-0.png
new file mode 100644
index 0000000000..a04b6b7bde
Binary files /dev/null and
b/docs/images/using-carbondata-in-notebook-visualization-0.png differ
diff --git a/docs/images/using-carbondata-in-notebook-visualization-1.png
b/docs/images/using-carbondata-in-notebook-visualization-1.png
new file mode 100644
index 0000000000..db6c646564
Binary files /dev/null and
b/docs/images/using-carbondata-in-notebook-visualization-1.png differ
diff --git a/docs/images/using-carbondata-in-notebook-visualization-2.png
b/docs/images/using-carbondata-in-notebook-visualization-2.png
new file mode 100644
index 0000000000..4e4a1d9088
Binary files /dev/null and
b/docs/images/using-carbondata-in-notebook-visualization-2.png differ
diff --git a/docs/images/using-carbondata-in-notebook-visualization-3.png
b/docs/images/using-carbondata-in-notebook-visualization-3.png
new file mode 100644
index 0000000000..7b3dd0c871
Binary files /dev/null and
b/docs/images/using-carbondata-in-notebook-visualization-3.png differ
diff --git a/docs/images/using-carbondata-in-notebook2.png
b/docs/images/using-carbondata-in-notebook2.png
deleted file mode 100644
index 47085ef16e..0000000000
Binary files a/docs/images/using-carbondata-in-notebook2.png and /dev/null
differ
diff --git a/docs/notebook/carbondata_notebook.ipynb
b/docs/notebook/carbondata_notebook.ipynb
new file mode 100644
index 0000000000..7a6752fb72
--- /dev/null
+++ b/docs/notebook/carbondata_notebook.ipynb
@@ -0,0 +1,185 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "910ebdc2",
+ "metadata": {},
+ "source": [
+ "## 1. Init spark with CarbonExtensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "f73459bd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pyspark.sql import SparkSession\n",
+ "spark =
SparkSession.builder.enableHiveSupport().config(\"spark.sql.extensions\",
\"org.apache.spark.sql.CarbonExtensions\").getOrCreate()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b73232b2",
+ "metadata": {},
+ "source": [
+ "## 2. Create CarbonData table and insert data by using spark sql"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "d458dc3a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "++\n",
+ "||\n",
+ "++\n",
+ "++\n",
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "DataFrame[]"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "spark.sql(\"DROP TABLE IF EXISTS carbondata_table\").show()\n",
+ "spark.sql(\"CREATE TABLE IF NOT EXISTS carbondata_table( age INT,adult
BOOLEAN) USING carbon\")\n",
+ "spark.sql(\"INSERT INTO carbondata_table VALUES(20,true)\")\n",
+ "spark.sql(\"INSERT INTO carbondata_table VALUES(10,false)\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "edfc1eeb",
+ "metadata": {},
+ "source": [
+ "## 3. Select data from CarbonData table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "2c234b64",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+---+-----+\n",
+ "|age|adult|\n",
+ "+---+-----+\n",
+ "| 10|false|\n",
+ "| 20| true|\n",
+ "+---+-----+\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "spark.sql(\"SELECT * FROM carbondata_table\").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c5df896b",
+ "metadata": {},
+ "source": [
+ "## 4. Describe the CarbonData table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "ac65cba3",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+--------+---------+-------+\n",
+ "|col_name|data_type|comment|\n",
+ "+--------+---------+-------+\n",
+ "| age| int| null|\n",
+ "| adult| boolean| null|\n",
+ "+--------+---------+-------+\n",
+ "\n",
+ "+--------------------+--------------------+-------+\n",
+ "| col_name| data_type|comment|\n",
+ "+--------------------+--------------------+-------+\n",
+ "| age| int| null|\n",
+ "| adult| boolean| null|\n",
+ "| | | |\n",
+ "|# Detailed Table ...| | |\n",
+ "| Database| default| |\n",
+ "| Table| carbondata_table| |\n",
+ "| Owner| jovyan| |\n",
+ "| Created Time|Sat Apr 15 17:22:...| |\n",
+ "| Last Access| UNKNOWN| |\n",
+ "| Created By| Spark 3.1.1| |\n",
+ "| Type| MANAGED| |\n",
+ "| Provider| carbon| |\n",
+ "| Location|file:/home/jovyan...| |\n",
+ "| Serde Library|org.apache.carbon...| |\n",
+ "| InputFormat|org.apache.carbon...| |\n",
+ "| OutputFormat|org.apache.carbon...| |\n",
+ "+--------------------+--------------------+-------+\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "spark.sql(\"DESCRIBE carbondata_table\").show()\n",
+ "spark.sql(\"DESCRIBE FORMATTED carbondata_table\").show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bfdcae5a",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/notebook/carbondata_notebook_with_visualization.ipynb
b/docs/notebook/carbondata_notebook_with_visualization.ipynb
new file mode 100644
index 0000000000..fd016a358a
--- /dev/null
+++ b/docs/notebook/carbondata_notebook_with_visualization.ipynb
@@ -0,0 +1,263 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "910ebdc2",
+ "metadata": {},
+ "source": [
+ "## 1. Init spark with CarbonExtensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "f73459bd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pyspark.sql import SparkSession\n",
+ "spark =
SparkSession.builder.enableHiveSupport().config(\"spark.sql.extensions\",
\"org.apache.spark.sql.CarbonExtensions\").getOrCreate()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b73232b2",
+ "metadata": {},
+ "source": [
+ "## 2. Create CarbonData table and insert data by using spark sql"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "d458dc3a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "++\n",
+ "||\n",
+ "++\n",
+ "++\n",
+ "\n",
+ "+---------+\n",
+ "|namespace|\n",
+ "+---------+\n",
+ "| default|\n",
+ "+---------+\n",
+ "\n",
+ "+--------+------------------+-----------+\n",
+ "|database| tableName|isTemporary|\n",
+ "+--------+------------------+-----------+\n",
+ "| default| carbondata_table| false|\n",
+ "| default|carbondata_table_v| false|\n",
+ "+--------+------------------+-----------+\n",
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "DataFrame[Segment ID: string]"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "spark.sql(\"DROP TABLE IF EXISTS carbondata_table_v\").show()\n",
+ "spark.sql(\"CREATE TABLE IF NOT EXISTS carbondata_table_v( id INT,age
INT,salary INT) USING carbondata\")\n",
+ "spark.sql(\"show databases\").show()\n",
+ "spark.sql(\"show tables\").show()\n",
+ "spark.sql(\"LOAD DATA LOCAL INPATH '/home/jovyan/sample_data_simple.csv'
into table carbondata_table_v options('header'='True')\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "edfc1eeb",
+ "metadata": {},
+ "source": [
+ "## 3. Select data from CarbonData table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "2c234b64",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+---+---+------+\n",
+ "| id|age|salary|\n",
+ "+---+---+------+\n",
+ "| 1| 23| 66823|\n",
+ "| 2| 23|198373|\n",
+ "| 3| 64| 82938|\n",
+ "| 4| 20| 55245|\n",
+ "| 5| 35| 67660|\n",
+ "| 6| 29| 56483|\n",
+ "| 7| 56|173354|\n",
+ "| 8| 58| 64758|\n",
+ "| 9| 27|171463|\n",
+ "| 10| 43|197818|\n",
+ "| 11| 28|172165|\n",
+ "| 12| 44| 62913|\n",
+ "| 13| 33|113427|\n",
+ "| 14| 25| 71427|\n",
+ "| 15| 51|115963|\n",
+ "| 16| 19| 82024|\n",
+ "| 17| 41|141233|\n",
+ "| 18| 26|100990|\n",
+ "| 19| 46|144365|\n",
+ "| 20| 27|133994|\n",
+ "+---+---+------+\n",
+ "only showing top 20 rows\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "spark.sql(\"SELECT * FROM carbondata_table_v\").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c5df896b",
+ "metadata": {},
+ "source": [
+ "## 4. Describe the CarbonData table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "ac65cba3",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+--------------------+--------------------+-------+\n",
+ "| col_name| data_type|comment|\n",
+ "+--------------------+--------------------+-------+\n",
+ "| id| int| null|\n",
+ "| age| int| null|\n",
+ "| salary| int| null|\n",
+ "| | | |\n",
+ "|## Detailed Table...| | |\n",
+ "| Database| default| |\n",
+ "| Table| carbondata_table_v| |\n",
+ "| Owner| jovyan| |\n",
+ "| Created|Sat Apr 15 17:23:...| |\n",
+ "| Location |/home/jovyan/spar...| |\n",
+ "| External| false| |\n",
+ "| Transactional| true| |\n",
+ "| Streaming| false| |\n",
+ "| Table Block Size | 1024 MB| |\n",
+ "|Table Blocklet Size | 64 MB| |\n",
+ "| Comment| | |\n",
+ "| Bad Record Path| | |\n",
+ "| Date Format| | |\n",
+ "| Timestamp Format| | |\n",
+ "|Min Input Per Nod...| 0.0B| |\n",
+ "+--------------------+--------------------+-------+\n",
+ "only showing top 20 rows\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "spark.sql(\"DESCRIBE FORMATTED carbondata_table_v\").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5fce784e",
+ "metadata": {},
+ "source": [
+ "## 5. Visualization"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "bfdcae5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df = spark.sql(\"SELECT age,avg(salary) as avg_salary FROM
carbondata_table_v where age>=37 and age <47 group by age \")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "86130c5b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAAaEAAAEkCAYAAACG1Y6pAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAuG0lEQVR4nO3df7xVVZ3/8ddbKbUaFUWNQMOUUswf5Y3UMjVSUEvM0YmaEs1iMmeqqSm1vo1W80Oz0qzRGSYUMH8xakoWIaL2wwS8aP5AQ0lNr6DgoOj0A0U/3z/WOrHv4dwf+3Lu3Zd738/H4zzOOWvvtfdnX/R+7lp77bUUEZiZmVVhs6oDMDOzwctJyMzMKuMkZGZmlXESMjOzyjgJmZlZZZyEzMysMk5CZtarJJ0kKSSdVHUs1v84CdkmR9JX8i+1kPSWquMxs55zErJNiiQBpwC1p6w/WWE4ZraRnIRsU3ME
[...]
+ "text/plain": [
+ "<Figure size 432x288 with 1 Axes>"
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plot\n",
+ "pd = df.toPandas()\n",
+ "plot.bar(pd[\"age\"], pd[\"avg_salary\"], color='blue')\n",
+ "plot.title('Average salary', fontsize=20)\n",
+ "plot.xlabel('age', fontsize=20)\n",
+ "plot.ylabel('Average salary', fontsize=20)\n",
+ "plot.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b2c67437",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/notebook/sample_data_simple.csv
b/docs/notebook/sample_data_simple.csv
new file mode 100644
index 0000000000..6952695fe4
--- /dev/null
+++ b/docs/notebook/sample_data_simple.csv
@@ -0,0 +1,101 @@
+id,age,salary
+1,23,66823
+2,23,198373
+3,64,82938
+4,20,55245
+5,35,67660
+6,29,56483
+7,56,173354
+8,58,64758
+9,27,171463
+10,43,197818
+11,28,172165
+12,44,62913
+13,33,113427
+14,25,71427
+15,51,115963
+16,19,82024
+17,41,141233
+18,26,100990
+19,46,144365
+20,27,133994
+21,32,81297
+22,54,139341
+23,44,156476
+24,37,128305
+25,35,115741
+26,62,165305
+27,33,147442
+28,19,64813
+29,29,174943
+30,29,144992
+31,38,154867
+32,53,177981
+33,36,66667
+34,50,198362
+35,38,72778
+36,40,115968
+37,33,110276
+38,32,174342
+39,62,142350
+40,55,174130
+41,59,134829
+42,50,143393
+43,30,87038
+44,54,133330
+45,52,158074
+46,49,77061
+47,64,183062
+48,30,154492
+49,64,84624
+50,42,115347
+51,57,170002
+52,27,77650
+53,61,194319
+54,64,146511
+55,54,70994
+56,28,79595
+57,54,167330
+58,21,159014
+59,53,181127
+60,60,59041
+61,36,177013
+62,59,119661
+63,22,174224
+64,41,195124
+65,32,127376
+66,46,75423
+67,18,142470
+68,58,151026
+69,23,150863
+70,50,65221
+71,19,92981
+72,63,54096
+73,30,136488
+74,48,169240
+75,39,140927
+76,53,78653
+77,40,192880
+78,32,158487
+79,43,177842
+80,35,110398
+81,45,119726
+82,23,95703
+83,21,121015
+84,64,115280
+85,59,175748
+86,54,69330
+87,43,106977
+88,23,194021
+89,37,58599
+90,53,72392
+91,28,130882
+92,55,119198
+93,35,137784
+94,56,90932
+95,45,122731
+96,44,145239
+97,59,110002
+98,51,121982
+99,60,176370
+100,26,134661
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index b3a8b79cef..76c1562735 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -50,7 +50,8 @@ CarbonData can be integrated with Spark, Presto, Flink and
Hive execution engine
[Installing and Configuring CarbonData Thrift Server for Query
Execution](#query-execution-using-the-thrift-server)
### Notebook
-[Using CarbonData in notebook](#using-carbondata-in-notebook.md)
+[Using CarbonData in notebook](#using-carbondata-in-notebook.md)
+[Using CarbonData to visualization in
notebook](#using-carbondata-to-visualization_in-notebook.md)
#### Presto
[Installing and Configuring CarbonData on
Presto](#installing-and-configuring-carbondata-on-presto)
diff --git a/docs/using-carbondata-in-notebook.md
b/docs/using-carbondata-in-notebook.md
index 2bf239fd41..ca9449f733 100644
--- a/docs/using-carbondata-in-notebook.md
+++ b/docs/using-carbondata-in-notebook.md
@@ -69,8 +69,11 @@
http://127.0.0.1:8888/?token=f2f24cd38ddb1d2e11d8dd09ab27a2062dca66efbc50c75c
## Using carbondata in notebook:
Opening the carbondata_notebook.ipynb
-
+
+
+You also can open this file from notebook
directory:[carbondata_notebook_with_visualization.ipynb](#notebook/You also can
open this file from notebook
directory:[carbondata_notebook.ipynb](#notebook/carbondata_notebook.ipynb))
Running carbondata example in notebook file:
-
+
+
diff --git a/docs/using-carbondata-in-notebook.md
b/docs/using-carbondata-to-visualization_in-notebook.md
similarity index 87%
copy from docs/using-carbondata-in-notebook.md
copy to docs/using-carbondata-to-visualization_in-notebook.md
index 2bf239fd41..371ec75916 100644
--- a/docs/using-carbondata-in-notebook.md
+++ b/docs/using-carbondata-to-visualization_in-notebook.md
@@ -67,10 +67,14 @@
http://127.0.0.1:8888/?token=f2f24cd38ddb1d2e11d8dd09ab27a2062dca66efbc50c75c
```
## Using carbondata in notebook:
-Opening the carbondata_notebook.ipynb
+Opening the carbondata_notebook_with_visualization.ipynb
-
+
-Running carbondata example in notebook file:
+You also can open this file from notebook
directory:[carbondata_notebook_with_visualization.ipynb](#notebook/carbondata_notebook_with_visualization.ipynb)
-
+Running carbondata example to visualization in notebook file:
+
+
+
+