carbondata git commit: [CARBONDATA-2050] Add example of query data with specified segments

qiangcai Thu, 18 Jan 2018 02:43:50 -0800

Repository: carbondata
Updated Branches:
  refs/heads/master f6d964050 -> a06939b69



[CARBONDATA-2050] Add example of query data with specified segments

This closes #1829


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/a06939b6
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/a06939b6
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/a06939b6

Branch: refs/heads/master
Commit: a06939b695c4df4ce6190c22e469fd2695b1603d
Parents: f6d9640
Author: chenliang613 <[email protected]>
Authored: Thu Jan 18 15:50:23 2018 +0800
Committer: QiangCai <[email protected]>
Committed: Thu Jan 18 18:42:43 2018 +0800

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md           |  48 ++-----
 .../examples/QuerySegmentExample.scala          | 138 +++++++++++++++++++
 2 files changed, 152 insertions(+), 34 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/a06939b6/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md 
b/docs/data-management-on-carbondata.md
index 859a060..3a2c1d3 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -781,59 +781,39 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   ```
   DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE 
'2017-06-01 12:05:06' 
   ```
-### SEGMENT READING
+
+### QUERY DATA WITH SPECIFIED SEGMENTS
 
   This command is used to read data from specified segments during CarbonScan.
   
-  
   Get the Segment ID:
-  
   ```
   SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
   ```
   
-  Set the segment IDs
-  
+  Set the segment IDs for table
   ```
-  SET cabon.input.segments.<database_name>.<table_name> = <list of segment 
IDs>;
+  SET carbon.input.segments.<database_name>.<table_name> = <list of segment 
IDs>
   ```
   
-  **Property:**
-  
-  cabon.input.segments:  Specifies the segment IDs to be queried. This 
property allows you to query specified segments of the specified table. The 
CarbonScan will read data from specified segments only.
-  
-  ```
-  SET cabon.input.segments.<database_name>.<table_name> = <list of segment 
IDs>;
-  ```
+  NOTE:
+  carbon.input.segments: Specifies the segment IDs to be queried. This 
property allows you to query specified segments of the specified table. The 
CarbonScan will read data from specified segments only.
   
   If user wants to query with segments reading in multi threading mode, then 
CarbonSession.threadSet can be used instead of SET query.
-  
   ```
-  CarbonSession.threadSet 
("cabon.input.segments.<database_name>.<table_name>","<list of segment IDs>");
+  CarbonSession.threadSet 
("carbon.input.segments.<database_name>.<table_name>","<list of segment IDs>");
   ```
   
-  Reset the segment IDs:
-  
+  Reset the segment IDs
   ```
-  SET cabon.input.segments.<database_name>.<table_name> = *;
+  SET carbon.input.segments.<database_name>.<table_name> = *;
   ```
   
   If user wants to query with segments reading in multi threading mode, then 
CarbonSession.threadSet can be used instead of SET query. 
-  
   ```
-  CarbonSession.threadSet 
("cabon.input.segments.<database_name>.<table_name>","*");
+  CarbonSession.threadSet 
("carbon.input.segments.<database_name>.<table_name>","*");
   ```
   
-  Reset
-  
-  It will reset all the properties set for carbondata. It is not recommended 
if you do not want to reset all the properties except cabon.input.segments.
-  
-  ```
-  RESET
-  ```
-  
-  **NOTE**: It is not recommended to set this property in carbon.properties 
file, because all the sessions will take this segments list unless it is 
overwritten at session or thread level.
-  
   **Examples:**
   
   * Example to show the list of segment IDs,segment status, and other required 
details and then specify the list of segments to be read.
@@ -841,13 +821,13 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   ```
   SHOW SEGMENTS FOR carbontable1;
   
-  SET cabon.input.segments.db.carbontable1 = 1,3,9;
+  SET carbon.input.segments.db.carbontable1 = 1,3,9;
   ```
   
   * Example to query with segments reading in multi threading mode:
   
   ```
-  CarbonSession.threadSet 
("cabon.input.segments.db.carbontable_Multi_Thread","1,3");
+  CarbonSession.threadSet 
("carbon.input.segments.db.carbontable_Multi_Thread","1,3");
   ```
   
   * Example for threadset in multithread environment (following shows how it 
is used in Scala code):
@@ -855,8 +835,8 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   ```
   def main(args: Array[String]) {
   Future {          
-    CarbonSession.threadSet 
("cabon.input.segments.db.carbontable_Multi_Thread","1")
-    spark.sql("select count(empno) from 
cabon.input.segments.db.carbontable_Multi_Thread").show();
+    CarbonSession.threadSet 
("carbon.input.segments.db.carbontable_Multi_Thread","1")
+    spark.sql("select count(empno) from 
carbon.input.segments.db.carbontable_Multi_Thread").show();
      }
    }
   ```

http://git-wip-us.apache.org/repos/asf/carbondata/blob/a06939b6/examples/spark2/src/main/scala/org/apache/carbondata/examples/QuerySegmentExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/spark2/src/main/scala/org/apache/carbondata/examples/QuerySegmentExample.scala
 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/QuerySegmentExample.scala
new file mode 100644
index 0000000..03312a0
--- /dev/null
+++ 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/QuerySegmentExample.scala
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+/**
+ * This example introduces how to query data with specified segments
+ */
+
+object QuerySegmentExample {
+
+  def main(args: Array[String]) {
+    val spark = ExampleUtils.createCarbonSession("QuerySegmentExample")
+    spark.sparkContext.setLogLevel("ERROR")
+
+    spark.sql("DROP TABLE IF EXISTS carbon_table")
+
+    // Create table
+    spark.sql(
+      s"""
+         | CREATE TABLE carbon_table(
+         | shortField SHORT,
+         | intField INT,
+         | bigintField LONG,
+         | doubleField DOUBLE,
+         | stringField STRING,
+         | timestampField TIMESTAMP,
+         | decimalField DECIMAL(18,2),
+         | dateField DATE,
+         | charField CHAR(5),
+         | floatField FLOAT
+         | )
+         | STORED BY 'carbondata'
+       """.stripMargin)
+
+    val rootPath = new File(this.getClass.getResource("/").getPath
+                            + "../../../..").getCanonicalPath
+    val path = s"$rootPath/examples/spark2/src/main/resources/data.csv"
+
+    // load 4 segments, each load has 10 rows data
+    // scalastyle:off
+    (1 to 4).foreach(_ => spark.sql(
+        s"""
+           | LOAD DATA LOCAL INPATH '$path'
+           | INTO TABLE carbon_table
+           | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#')
+       """.stripMargin))
+    // scalastyle:on
+
+    // 1.Query data with specified segments without compaction
+
+    spark.sql("SHOW SEGMENTS FOR TABLE carbon_table").show()
+    // 40 rows
+    spark.sql(
+      s"""
+         | SELECT count(*)
+         | FROM carbon_table
+       """.stripMargin).show()
+
+    // specify segments to query
+    spark.sql("SET carbon.input.segments.default.carbon_table = 1,3")
+    // 20 rows from segment1 and segment3
+    spark.sql(
+      s"""
+         | SELECT count(*)
+         | FROM carbon_table
+       """.stripMargin).show()
+
+    // 2.Query data with specified segments after compaction
+
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.COMPACTION_SEGMENT_LEVEL_THRESHOLD, 
"3,2")
+
+    spark.sql("ALTER TABLE carbon_table COMPACT 'MINOR'")
+    spark.sql("SHOW SEGMENTS FOR TABLE carbon_table").show()
+
+    // Reset to query all segments data
+    spark.sql("SET carbon.input.segments.default.carbon_table = *")
+    // 40 rows from all segments
+    spark.sql(
+      s"""
+         | SELECT count(*)
+         | FROM carbon_table
+       """.stripMargin).show()
+    // After MINOR compaction, 0.1 has 30 rows data(compact 3 segments)
+    spark.sql("SET carbon.input.segments.default.carbon_table = 0.1")
+    spark.sql(
+      s"""
+         | SELECT count(*)
+         | FROM carbon_table
+       """.stripMargin).show()
+
+    spark.sql("ALTER TABLE carbon_table COMPACT 'MAJOR'")
+    spark.sql("CLEAN FILES FOR TABLE carbon_table")
+    spark.sql("SHOW SEGMENTS FOR TABLE carbon_table").show()
+
+    // Load 2 new segments
+    (1 to 2).foreach(_ => spark.sql(
+      s"""
+         | LOAD DATA LOCAL INPATH '$path'
+         | INTO TABLE carbon_table
+         | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#')
+       """.stripMargin))
+
+    spark.sql("SHOW SEGMENTS FOR TABLE carbon_table").show()
+    // 50 rows: segment0.2 has 40 rows after major compaction, plus segment5 
with 10 rows
+    spark.sql("SET carbon.input.segments.default.carbon_table = 0.2,5")
+    spark.sql(
+      s"""
+         | SELECT count(*)
+         | FROM carbon_table
+       """.stripMargin).show()
+
+    // Drop table
+    spark.sql("DROP TABLE IF EXISTS carbon_table")
+    spark.stop()
+  }
+
+}
\ No newline at end of file

carbondata git commit: [CARBONDATA-2050] Add example of query data with specified segments

Reply via email to