carbondata git commit: [CARBONDATA-2098] Optimize document for datamap

chenliang613 Fri, 02 Mar 2018 20:18:47 -0800

Repository: carbondata
Updated Branches:
  refs/heads/master 9c12f5dc8 -> bfd77f69f



[CARBONDATA-2098] Optimize document for datamap

Optimize document for datamap

This closes #2025


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/bfd77f69
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/bfd77f69
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/bfd77f69

Branch: refs/heads/master
Commit: bfd77f69faf4934fd7d5971b808e1ddc2d35bf2a
Parents: 9c12f5d
Author: Jacky Li <jacky.li...@qq.com>
Authored: Sat Mar 3 11:34:46 2018 +0800
Committer: chenliang613 <chenliang...@huawei.com>
Committed: Sat Mar 3 12:18:13 2018 +0800

----------------------------------------------------------------------
 docs/datamap/preaggregate-datamap-guide.md | 213 ++++++++++++++++
 docs/datamap/timeseries-datamap-guide.md   | 135 ++++++++++
 docs/preaggregate-guide.md                 | 313 ------------------------
 3 files changed, 348 insertions(+), 313 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/bfd77f69/docs/datamap/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/preaggregate-datamap-guide.md 
b/docs/datamap/preaggregate-datamap-guide.md
new file mode 100644
index 0000000..fabfd7d
--- /dev/null
+++ b/docs/datamap/preaggregate-datamap-guide.md
@@ -0,0 +1,213 @@
+# CarbonData Pre-aggregate DataMap
+  
+## Quick example
+Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
+
+Package carbon jar, and copy 
assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar 
to $SPARK_HOME/jars
+```shell
+mvn clean package -DskipTests -Pspark-2.2
+```
+
+Start spark-shell in new terminal, type :paste, then copy and run the 
following code.
+```scala
+ import java.io.File
+ import org.apache.spark.sql.{CarbonEnv, SparkSession}
+ import org.apache.spark.sql.CarbonSession._
+ import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}
+ import org.apache.carbondata.core.util.path.CarbonStorePath
+ 
+ val warehouse = new File("./warehouse").getCanonicalPath
+ val metastore = new File("./metastore").getCanonicalPath
+ 
+ val spark = SparkSession
+   .builder()
+   .master("local")
+   .appName("preAggregateExample")
+   .config("spark.sql.warehouse.dir", warehouse)
+   .getOrCreateCarbonSession(warehouse, metastore)
+
+ spark.sparkContext.setLogLevel("ERROR")
+
+ // drop table if exists previously
+ spark.sql(s"DROP TABLE IF EXISTS sales")
+ 
+ // Create main table
+ spark.sql(
+   s"""
+      | CREATE TABLE sales (
+      | user_id string,
+      | country string,
+      | quantity int,
+      | price bigint)
+      | STORED BY 'carbondata'
+    """.stripMargin)
+ 
+ // Create pre-aggregate table on the main table
+ // If main table already have data, following command 
+ // will trigger one immediate load to the pre-aggregate table
+ spark.sql(
+   s"""
+      | CREATE DATAMAP agg_sales
+      | ON TABLE sales
+      | USING "preaggregate"
+      | AS
+      | SELECT country, sum(quantity), avg(price)
+      | FROM sales
+      | GROUP BY country
+    """.stripMargin)
+      
+  import spark.implicits._
+  import org.apache.spark.sql.SaveMode
+  import scala.util.Random
+ 
+  // Load data to the main table, it will also
+  // trigger immediate load to pre-aggregate table.
+  // These two loading operation is carried out in a
+  // transactional manner, meaning that the whole 
+  // operation will fail if one of the loading fails
+  val r = new Random()
+  spark.sparkContext.parallelize(1 to 10)
+   .map(x => ("ID." + r.nextInt(100000), "country" + x % 8, x % 50, x % 60))
+   .toDF("user_id", "country", "quantity", "price")
+   .write
+   .format("carbondata")
+   .option("tableName", "sales")
+   .option("compress", "true")
+   .mode(SaveMode.Append)
+   .save()
+      
+  spark.sql(
+    s"""
+       |SELECT country, sum(quantity), avg(price)
+       | from sales GROUP BY country
+     """.stripMargin).show
+
+  spark.stop
+```
+
+##PRE-AGGREGATE DataMap  
+  Pre-aggregate tables are created as DataMaps and managed as tables 
internally by CarbonData. 
+  User can create as many pre-aggregate datamaps required to improve query 
performance, 
+  provided the storage requirements and loading speeds are acceptable.
+  
+  Once pre-aggregate datamaps are created, CarbonData's SparkSQL optimizer 
extension supports to 
+  select the most efficient pre-aggregate datamap and rewrite the SQL to query 
against the selected 
+  datamap instead of the main table. Since the data size of pre-aggregate 
datamap is smaller, 
+  user queries are much faster. In our previous experience, we have seen 5X to 
100X times faster 
+  in production SQLs.
+    
+  For instance, main table called **sales** which is defined as 
+  
+  ```
+  CREATE TABLE sales (
+    order_time timestamp,
+    user_id string,
+    sex string,
+    country string,
+    quantity int,
+    price bigint)
+  STORED BY 'carbondata'
+  ```
+  
+  User can create pre-aggregate tables using the Create DataMap DDL
+  
+  ```
+  CREATE DATAMAP agg_sales
+  ON TABLE sales
+  USING "preaggregate"
+  AS
+    SELECT country, sex, sum(quantity), avg(price)
+    FROM sales
+    GROUP BY country, sex
+  ```
+  
+#### Functions supported in pre-aggregate table
+
+| Function | Rollup supported |
+|----------|:----------------:|
+| SUM      |Yes               |
+| AVG      |Yes               |
+| MAX      |Yes               |
+| MIN      |Yes               |
+| COUNT    |Yes               |
+
+
+#### How pre-aggregate tables are selected
+When a user query is submitted, during query planning phase, CarbonData will 
collect all matched 
+pre-aggregate tables as candidates according to Relational Algebra 
transformation rules. Then, the 
+best pre-aggregate table for this query will be selected among the candidates 
based on cost. 
+For simplicity, current cost estimation is based on the data size of the 
pre-aggregate table. (We 
+assume that query will be faster on smaller table)
+
+For the main table **sales** and pre-aggregate table **agg_sales** created 
above, following queries 
+```
+SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
+
+SELECT sex, sum(quantity) from sales GROUP BY sex
+
+SELECT sum(price), country from sales GROUP BY country
+``` 
+
+will be transformed by CarbonData's query planner to query against 
pre-aggregate table 
+**agg_sales** instead of the main table **sales**
+
+However, for following queries
+```
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY 
user_id, country, sex
+
+SELECT sex, avg(quantity) from sales GROUP BY sex
+
+SELECT country, max(price) from sales GROUP BY country
+```
+
+will query against main table **sales** only, because it does not satisfy 
pre-aggregate table 
+selection logic. 
+
+#### Loading data to pre-aggregate tables
+For existing table with loaded data, data load to pre-aggregate table will be 
triggered by the 
+CREATE DATAMAP statement when user creates the pre-aggregate table. For 
incremental loads after 
+aggregates tables are created, loading data to main table triggers the load to 
pre-aggregate tables 
+once main table loading is complete. 
+
+These loads are transactional 
+meaning that data on main table and pre-aggregate tables are only visible to 
the user after all 
+tables are loaded successfully, if one of these loads fails, new data are not 
visible in all tables 
+as if the load operation is not happened.   
+
+#### Querying data from pre-aggregate tables
+As a technique for query acceleration, Pre-aggregate tables cannot be queries 
directly. 
+Queries are to be made on main table. While doing query planning, internally 
CarbonData will check 
+associated pre-aggregate tables with the main table, and do query plan 
transformation accordingly. 
+
+User can verify whether a query can leverage pre-aggregate table or not by 
executing `EXPLAIN`
+command, which will show the transformed logical plan, and thus user can check 
whether pre-aggregate
+table is selected.
+
+#### Compacting pre-aggregate tables
+Running Compaction command (`ALTER TABLE COMPACT`) on main table will **not 
automatically** 
+compact the pre-aggregate tables created on the main table. User need to run 
Compaction command 
+separately on each pre-aggregate table to compact them.
+
+Compaction is an optional operation for pre-aggregate table. If compaction is 
performed on
+main table but not performed on pre-aggregate table, all queries still can 
benefit from 
+pre-aggregate tables. To further improve the query performance, compaction on 
pre-aggregate tables 
+can be triggered to merge the segments and files in the pre-aggregate tables. 
+
+#### Data Management on pre-aggregate tables
+Once there is pre-aggregate table created on the main table, following command 
on the main table
+is not supported:
+1. Data management command: `UPDATE/DELETE/DELETE SEGMENT`. 
+2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE 
DATATYPE`, 
+`ALTER TABLE RENAME`. Note that adding a new column is supported, and for 
dropping columns and 
+change datatype command, CarbonData will check whether it will impact the 
pre-aggregate table, if 
+ not, the operation is allowed, otherwise operation will be rejected by 
throwing exception.   
+3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`
+
+However, there is still way to support these operations on main table, in 
current CarbonData 
+release, user can do as following:
+1. Remove the pre-aggregate table by `DROP DATAMAP` command
+2. Carry out the data management operation on main table
+3. Create the pre-aggregate table again by `CREATE DATAMAP` command
+Basically, user can manually trigger the operation by re-building the datamap.
+
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/bfd77f69/docs/datamap/timeseries-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/timeseries-datamap-guide.md 
b/docs/datamap/timeseries-datamap-guide.md
new file mode 100644
index 0000000..ecd7234
--- /dev/null
+++ b/docs/datamap/timeseries-datamap-guide.md
@@ -0,0 +1,135 @@
+# CarbonData Timeseries DataMap
+
+## Supporting timeseries data (Alpha feature in 1.3.0)
+Timeseries DataMap a pre-aggregate table implementation based on 
'preaggregate' DataMap. 
+Difference is that Timerseries DataMap has built-in understanding of time 
hierarchy and 
+levels: year, month, day, hour, minute, so that it supports automatic roll-up 
in time dimension 
+for query.
+  
+For instance, user can create multiple timeseries datamap on the main table 
which has a *event_time*
+column, one datamap for one time granularity. Then Carbondata can do automatic 
roll-up for queries 
+on the main table.
+
+```
+CREATE DATAMAP agg_year
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'year_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+  
+CREATE DATAMAP agg_month
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'month_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+  
+CREATE DATAMAP agg_day
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'day_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+      
+CREATE DATAMAP agg_sales_hour
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'hour_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+
+CREATE DATAMAP agg_minute
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'minute_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+  
+CREATE DATAMAP agg_minute
+ON TABLE sales
+USING "timeseries"
+DMPROPERTIES (
+  'event_time'='order_time',
+  'minute_granualrity'='1',
+) AS
+SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), 
sum(price),
+ avg(price) FROM sales GROUP BY order_time, country, sex
+```
+  
+For querying timeseries data, Carbondata has builtin support for following 
time related UDF 
+to enable automatically roll-up to the desired aggregation level
+```
+timeseries(timeseries column name, 'aggregation level')
+```
+```
+SELECT timeseries(order_time, 'hour'), sum(quantity) FROM sales GROUP BY 
timeseries(order_time,
+'hour')
+```
+  
+It is **not necessary** to create pre-aggregate tables for each granularity 
unless required for 
+query. Carbondata can roll-up the data and fetch it.
+ 
+For Example: For main table **sales** , if following timeseries datamaps were 
created for day 
+level and hour level pre-aggregate
+  
+```
+  CREATE DATAMAP agg_day
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+    'event_time'='order_time',
+    'day_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+        
+  CREATE DATAMAP agg_sales_hour
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+    'event_time'='order_time',
+    'hour_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+```
+
+Queries like below will be rolled-up and hit the timeseries datamaps
+```
+Select timeseries(order_time, 'month'), sum(quantity) from sales group by 
timeseries(order_time,
+  'month')
+  
+Select timeseries(order_time, 'year'), sum(quantity) from sales group by 
timeseries(order_time,
+  'year')
+```
+
+NOTE (<b>RESTRICTION</b>):
+* Only value of 1 is supported for hierarchy levels. Other hierarchy levels 
will be supported in
+the future CarbonData release. 
+* timeseries datamap for the desired levels needs to be created one after the 
other
+* timeseries datamaps created for each level needs to be dropped separately 
+      
+
+#### Compacting timeseries datamp
+Refer to Compaction section in [preaggregation 
datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).
 
+Same applies to timeseries datamap.
+
+#### Data Management on timeseries datamap
+Refer to Data Management section in [preaggregation 
datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).
+Same applies to timeseries datamap.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/bfd77f69/docs/preaggregate-guide.md
----------------------------------------------------------------------
diff --git a/docs/preaggregate-guide.md b/docs/preaggregate-guide.md
deleted file mode 100644
index 411433a..0000000
--- a/docs/preaggregate-guide.md
+++ /dev/null
@@ -1,313 +0,0 @@
-# CarbonData Pre-aggregate tables
-  
-## Quick example
-Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
-
-Package carbon jar, and copy 
assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar 
to $SPARK_HOME/jars
-```shell
-mvn clean package -DskipTests -Pspark-2.2
-```
-
-Start spark-shell in new terminal, type :paste, then copy and run the 
following code.
-```scala
- import java.io.File
- import org.apache.spark.sql.{CarbonEnv, SparkSession}
- import org.apache.spark.sql.CarbonSession._
- import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}
- import org.apache.carbondata.core.util.path.CarbonStorePath
- 
- val warehouse = new File("./warehouse").getCanonicalPath
- val metastore = new File("./metastore").getCanonicalPath
- 
- val spark = SparkSession
-   .builder()
-   .master("local")
-   .appName("preAggregateExample")
-   .config("spark.sql.warehouse.dir", warehouse)
-   .getOrCreateCarbonSession(warehouse, metastore)
-
- spark.sparkContext.setLogLevel("ERROR")
-
- // drop table if exists previously
- spark.sql(s"DROP TABLE IF EXISTS sales")
- // Create target carbon table and populate with initial data
- spark.sql(
-   s"""
-      | CREATE TABLE sales (
-      | user_id string,
-      | country string,
-      | quantity int,
-      | price bigint)
-      | STORED BY 'carbondata'""".stripMargin)
-      
- spark.sql(
-   s"""
-      | CREATE DATAMAP agg_sales
-      | ON TABLE sales
-      | USING "preaggregate"
-      | AS
-      | SELECT country, sum(quantity), avg(price)
-      | FROM sales
-      | GROUP BY country""".stripMargin)
-      
- import spark.implicits._
- import org.apache.spark.sql.SaveMode
- import scala.util.Random
- 
- val r = new Random()
- val df = spark.sparkContext.parallelize(1 to 10)
-   .map(x => ("ID." + r.nextInt(100000), "country" + x % 8, x % 50, x % 60))
-   .toDF("user_id", "country", "quantity", "price")
-
- // Create table with pre-aggregate table
- df.write.format("carbondata")
-   .option("tableName", "sales")
-   .option("compress", "true")
-   .mode(SaveMode.Append).save()
-      
- spark.sql(
-      s"""
-    |SELECT country, sum(quantity), avg(price)
-    | from sales GROUP BY country""".stripMargin).show
-
- spark.stop
-```
-
-##PRE-AGGREGATE TABLES  
-  Carbondata supports pre aggregating of data so that OLAP kind of queries can 
fetch data 
-  much faster.Aggregate tables are created as datamaps so that the handling is 
as efficient as 
-  other indexing support.Users can create as many aggregate tables they 
require as datamaps to 
-  improve their query performance,provided the storage requirements and 
loading speeds are 
-  acceptable.
-  
-  For main table called **sales** which is defined as 
-  
-  ```
-  CREATE TABLE sales (
-  order_time timestamp,
-  user_id string,
-  sex string,
-  country string,
-  quantity int,
-  price bigint)
-  STORED BY 'carbondata'
-  ```
-  
-  user can create pre-aggregate tables using the DDL
-  
-  ```
-  CREATE DATAMAP agg_sales
-  ON TABLE sales
-  USING "preaggregate"
-  AS
-  SELECT country, sex, sum(quantity), avg(price)
-  FROM sales
-  GROUP BY country, sex
-  ```
-  
-
-  
-<b><p align="left">Functions supported in pre-aggregate tables</p></b>
-
-| Function | Rollup supported |
-|-----------|----------------|
-| SUM | Yes |
-| AVG | Yes |
-| MAX | Yes |
-| MIN | Yes |
-| COUNT | Yes |
-
-
-##### How pre-aggregate tables are selected
-For the main table **sales** and pre-aggregate table **agg_sales** created 
above, queries of the 
-kind
-```
-SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
-
-SELECT sex, sum(quantity) from sales GROUP BY sex
-
-SELECT sum(price), country from sales GROUP BY country
-``` 
-
-will be transformed by Query Planner to fetch data from pre-aggregate table 
**agg_sales**
-
-But queries of kind
-```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY 
user_id, country, sex
-
-SELECT sex, avg(quantity) from sales GROUP BY sex
-
-SELECT country, max(price) from sales GROUP BY country
-```
-
-will fetch the data from the main table **sales**
-
-##### Loading data to pre-aggregate tables
-For existing table with loaded data, data load to pre-aggregate table will be 
triggered by the 
-CREATE DATAMAP statement when user creates the pre-aggregate table.
-For incremental loads after aggregates tables are created, loading data to 
main table triggers 
-the load to pre-aggregate tables once main table loading is complete. These 
loads are automic 
-meaning that data on main table and aggregate tables are only visible to the 
user after all tables 
-are loaded
-
-##### Querying data from pre-aggregate tables
-Pre-aggregate tables cannot be queries directly. Queries are to be made on 
main table. Internally 
-carbondata will check associated pre-aggregate tables with the main table, and 
if the 
-pre-aggregate tables satisfy the query condition, the plan is transformed 
automatically to use 
-pre-aggregate table to fetch the data.
-
-##### Compacting pre-aggregate tables
-Compaction command (ALTER TABLE COMPACT) need to be run separately on each 
pre-aggregate table.
-Running Compaction command on main table will **not automatically** compact 
the pre-aggregate 
-tables.Compaction is an optional operation for pre-aggregate table. If 
compaction is performed on
-main table but not performed on pre-aggregate table, all queries still can 
benefit from 
-pre-aggregate tables. To further improve performance on pre-aggregate tables, 
compaction can be 
-triggered on pre-aggregate tables directly, it will merge the segments inside 
pre-aggregate table. 
-
-##### Update/Delete Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Update/Delete operations are <b>not supported</b> on main table which has 
pre-aggregate tables 
-  created on it. All the pre-aggregate tables <b>will have to be dropped</b> 
before update/delete 
-  operations can be performed on the main table. Pre-aggregate tables can be 
rebuilt manually 
-  after update/delete operations are completed
- 
-##### Delete Segment Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Delete Segment operations are <b>not supported</b> on main table which has 
pre-aggregate tables 
-  created on it. All the pre-aggregate tables <b>will have to be dropped</b> 
before update/delete 
-  operations can be performed on the main table. Pre-aggregate tables can be 
rebuilt manually 
-  after delete segment operations are completed
-  
-##### Alter Table Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Adding new column in new table does not have any affect on pre-aggregate 
tables. However if 
-  dropping or renaming a column has impact in pre-aggregate table, such 
operations will be 
-  rejected and error will be thrown. All the pre-aggregate tables <b>will have 
to be dropped</b> 
-  before Alter Operations can be performed on the main table. Pre-aggregate 
tables can be rebuilt 
-  manually after Alter Table operations are completed
-  
-### Supporting timeseries data (Alpha feature in 1.3.0)
-Carbondata has built-in understanding of time hierarchy and levels: year, 
month, day, hour, minute.
-Multiple pre-aggregate tables can be created for the hierarchy and Carbondata 
can do automatic 
-roll-up for the queries on these hierarchies.
-
-  ```
-  CREATE DATAMAP agg_year
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'year_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_month
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'month_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_day
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'day_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-        
-  CREATE DATAMAP agg_sales_hour
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'hour_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-  
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'minute_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'minute_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-  ```
-  
-  For Querying data and automatically roll-up to the desired aggregation 
level,Carbondata supports 
-  UDF as
-  ```
-  timeseries(timeseries column name, 'aggregation level')
-  ```
-  ```
-  Select timeseries(order_time, 'hour'), sum(quantity) from sales group by 
timeseries(order_time,
-  'hour')
-  ```
-  
-  It is **not necessary** to create pre-aggregate tables for each granularity 
unless required for 
-  query. Carbondata can roll-up the data and fetch it.
-   
-  For Example: For main table **sales** , If pre-aggregate tables were created 
as  
-  
-  ```
-  CREATE DATAMAP agg_day
-    ON TABLE sales
-    USING "timeseries"
-    DMPROPERTIES (
-    'event_time'='order_time',
-    'day_granualrity'='1',
-    ) AS
-    SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-     avg(price) FROM sales GROUP BY order_time, country, sex
-          
-    CREATE DATAMAP agg_sales_hour
-    ON TABLE sales
-    USING "timeseries"
-    DMPROPERTIES (
-    'event_time'='order_time',
-    'hour_granualrity'='1',
-    ) AS
-    SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-     avg(price) FROM sales GROUP BY order_time, country, sex
-  ```
-  
-  Queries like below will be rolled-up and fetched from pre-aggregate tables
-  ```
-  Select timeseries(order_time, 'month'), sum(quantity) from sales group by 
timeseries(order_time,
-    'month')
-    
-  Select timeseries(order_time, 'year'), sum(quantity) from sales group by 
timeseries(order_time,
-    'year')
-  ```
-  
-  NOTE (<b>RESTRICTION</b>):
-  * Only value of 1 is supported for hierarchy levels. Other hierarchy levels 
are not supported. 
-  Other hierarchy levels are not supported
-  * pre-aggregate tables for the desired levels needs to be created one after 
the other
-  * pre-aggregate tables created for each level needs to be dropped separately 
-

carbondata git commit: [CARBONDATA-2098] Optimize document for datamap

Reply via email to