[1/3] carbondata-site git commit: Changes for release 1.3.0

chenliang613 Wed, 07 Feb 2018 17:12:59 -0800

Repository: carbondata-site
Updated Branches:
  refs/heads/asf-site 711502d1e -> c2b417a32



http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/main/webapp/streaming-guide.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/streaming-guide.html 
b/src/main/webapp/streaming-guide.html
new file mode 100644
index 0000000..8d3effe
--- /dev/null
+++ b/src/main/webapp/streaming-guide.html
@@ -0,0 +1,384 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <link href='images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <title>CarbonData</title>
+    <style>
+
+    </style>
+    <!-- Bootstrap -->
+
+    <link rel="stylesheet" href="css/bootstrap.min.css">
+    <link href="css/style.css" rel="stylesheet">
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+    <script 
src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js";></script>
+    <script 
src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+    <script src="js/jquery.min.js"></script>
+    <script src="js/bootstrap.min.js"></script>
+
+
+</head>
+<body>
+<header>
+    <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper">
+        <div class="container">
+            <div class="navbar-header">
+                <button aria-controls="navbar" aria-expanded="false" 
data-target="#navbar" data-toggle="collapse"
+                        class="navbar-toggle collapsed" type="button">
+                    <span class="sr-only">Toggle navigation</span>
+                    <span class="icon-bar"></span>
+                    <span class="icon-bar"></span>
+                    <span class="icon-bar"></span>
+                </button>
+                <a href="index.html" class="logo">
+                    <img src="images/CarbonDataLogo.png" alt="CarbonData logo" 
title="CarbocnData logo"/>
+                </a>
+            </div>
+            <div class="navbar-collapse collapse cd_navcontnt" id="navbar">
+                <ul class="nav navbar-nav navbar-right navlist-custom">
+                    <li><a href="index.html" class="hidden-xs"><i class="fa 
fa-home" aria-hidden="true"></i> </a>
+                    </li>
+                    <li><a href="index.html" class="hidden-lg hidden-md 
hidden-sm">Home</a></li>
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle " 
data-toggle="dropdown" role="button" aria-haspopup="true"
+                           aria-expanded="false"> Download <span 
class="caret"></span></a>
+                        <ul class="dropdown-menu">
+                            <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/";
+                                   target="_blank">Apache CarbonData 
1.3.0</a></li>
+                            <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.2.0/";
+                                   target="_blank">Apache CarbonData 
1.2.0</a></li>
+                            <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.1.1/";
+                                   target="_blank">Apache CarbonData 
1.1.1</a></li>
+                            <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.1.0/";
+                                   target="_blank">Apache CarbonData 
1.1.0</a></li>
+                            <li>
+                                <a 
href="http://archive.apache.org/dist/incubator/carbondata/1.0.0-incubating/";
+                                   target="_blank">Apache CarbonData 
1.0.0</a></li>
+                            <li>
+                                <a 
href="http://archive.apache.org/dist/incubator/carbondata/0.2.0-incubating/";
+                                   target="_blank">Apache CarbonData 
0.2.0</a></li>
+                            <li>
+                                <a 
href="http://archive.apache.org/dist/incubator/carbondata/0.1.1-incubating/";
+                                   target="_blank">Apache CarbonData 
0.1.1</a></li>
+                            <li>
+                                <a 
href="http://archive.apache.org/dist/incubator/carbondata/0.1.0-incubating/";
+                                   target="_blank">Apache CarbonData 
0.1.0</a></li>
+                            <li>
+                                <a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Releases";
+                                   target="_blank">Release Archive</a></li>
+                        </ul>
+                    </li>
+                    <li><a href="mainpage.html" 
class="active">Documentation</a></li>
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown" role="button" aria-haspopup="true"
+                           aria-expanded="false">Community <span 
class="caret"></span></a>
+                        <ul class="dropdown-menu">
+                            <li>
+                                <a 
href="https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md";
+                                   target="_blank">Contributing to 
CarbonData</a></li>
+                            <li>
+                                <a 
href="https://github.com/apache/carbondata/blob/master/docs/release-guide.md";
+                                   target="_blank">Release Guide</a></li>
+                            <li>
+                                <a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/PMC+and+Committers+member+list";
+                                   target="_blank">Project PMC and 
Committers</a></li>
+                            <li>
+                                <a 
href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609";
+                                   target="_blank">CarbonData Meetups</a></li>
+                            <li><a href="security.html">Apache CarbonData 
Security</a></li>
+                            <li><a 
href="https://issues.apache.org/jira/browse/CARBONDATA"; target="_blank">Apache
+                                Jira</a></li>
+                            <li><a href="videogallery.html">CarbonData Videos 
</a></li>
+                        </ul>
+                    </li>
+                    <li class="dropdown">
+                        <a href="http://www.apache.org/"; class="apache_link 
hidden-xs dropdown-toggle"
+                           data-toggle="dropdown" role="button" 
aria-haspopup="true" aria-expanded="false">Apache</a>
+                        <ul class="dropdown-menu">
+                            <li><a href="http://www.apache.org/"; 
target="_blank">Apache Homepage</a></li>
+                            <li><a href="http://www.apache.org/licenses/"; 
target="_blank">License</a></li>
+                            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";
+                                   target="_blank">Sponsorship</a></li>
+                            <li><a 
href="http://www.apache.org/foundation/thanks.html"; 
target="_blank">Thanks</a></li>
+                        </ul>
+                    </li>
+
+                    <li class="dropdown">
+                        <a href="http://www.apache.org/"; class="hidden-lg 
hidden-md hidden-sm dropdown-toggle"
+                           data-toggle="dropdown" role="button" 
aria-haspopup="true" aria-expanded="false">Apache</a>
+                        <ul class="dropdown-menu">
+                            <li><a href="http://www.apache.org/"; 
target="_blank">Apache Homepage</a></li>
+                            <li><a href="http://www.apache.org/licenses/"; 
target="_blank">License</a></li>
+                            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";
+                                   target="_blank">Sponsorship</a></li>
+                            <li><a 
href="http://www.apache.org/foundation/thanks.html"; 
target="_blank">Thanks</a></li>
+                        </ul>
+                    </li>
+
+                    <li>
+                        <a href="#" id="search-icon"><i class="fa fa-search" 
aria-hidden="true"></i></a>
+
+                    </li>
+
+                </ul>
+            </div><!--/.nav-collapse -->
+            <div id="search-box">
+                <form method="get" action="http://www.google.com/search"; 
target="_blank">
+                    <div class="search-block">
+                        <table border="0" cellpadding="0" width="100%">
+                            <tr>
+                                <td style="width:80%">
+                                    <input type="text" name="q" size=" 5" 
maxlength="255" value=""
+                                           class="search-input"  
placeholder="Search...."    required/>
+                                </td>
+                                <td style="width:20%">
+                                    <input type="submit" value="Search"/></td>
+                            </tr>
+                            <tr>
+                                <td align="left" style="font-size:75%" 
colspan="2">
+                                    <input type="checkbox" name="sitesearch" 
value="carbondata.apache.org" checked/>
+                                    <span style=" position: relative; top: 
-3px;"> Only search for CarbonData</span>
+                                </td>
+                            </tr>
+                        </table>
+                    </div>
+                </form>
+            </div>
+        </div>
+    </nav>
+</header> <!-- end Header part -->
+
+<div class="fixed-padding"></div> <!--  top padding with fixde header  -->
+
+<section><!-- Dashboard nav -->
+    <div class="container-fluid q">
+        <div class="col-sm-12  col-md-12 maindashboard">
+            <div class="row">
+                <section>
+                    <div style="padding:10px 15px;">
+                        <div id="viewpage" name="viewpage">
+                            <div class="row">
+                                <div class="col-sm-12  col-md-12">
+                                    <div><h1>
+<a id="carbondata-streaming-ingestion" class="anchor" 
href="#carbondata-streaming-ingestion" aria-hidden="true"><span 
aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData Streaming 
Ingestion</h1>
+<h2>
+<a id="quick-example" class="anchor" href="#quick-example" 
aria-hidden="true"><span aria-hidden="true" class="octicon 
octicon-link"></span></a>Quick example</h2>
+<p>Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME</p>
+<p>Package carbon jar, and copy 
assembly/target/scala-2.11/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar 
to $SPARK_HOME/jars</p>
+<div class="highlight highlight-source-shell"><pre>mvn clean package 
-DskipTests -Pspark-2.2</pre></div>
+<p>Start a socket data server in a terminal</p>
+<div class="highlight highlight-source-shell"><pre> nc -lk 9099</pre></div>
+<p>type some CSV rows as following</p>
+<pre lang="csv"><code>1,col1
+2,col2
+3,col3
+4,col4
+5,col5
+</code></pre>
+<p>Start spark-shell in new terminal, type :paste, then copy and run the 
following code.</p>
+<div class="highlight highlight-source-scala"><pre> <span 
class="pl-k">import</span> <span class="pl-smi">java.io.</span><span 
class="pl-smi">File</span>
+ <span class="pl-k">import</span> <span 
class="pl-smi">org.apache.spark.sql.</span>{<span 
class="pl-smi">CarbonEnv</span>, <span class="pl-smi">SparkSession</span>}
+ <span class="pl-k">import</span> <span 
class="pl-smi">org.apache.spark.sql.CarbonSession.</span><span 
class="pl-smi">_</span>
+ <span class="pl-k">import</span> <span 
class="pl-smi">org.apache.spark.sql.streaming.</span>{<span 
class="pl-smi">ProcessingTime</span>, <span 
class="pl-smi">StreamingQuery</span>}
+ <span class="pl-k">import</span> <span 
class="pl-smi">org.apache.carbondata.core.util.path.</span><span 
class="pl-smi">CarbonStorePath</span>
+ 
+ <span class="pl-k">val</span> <span class="pl-en">warehouse</span> <span 
class="pl-k">=</span> <span class="pl-k">new</span> <span 
class="pl-en">File</span>(<span class="pl-s"><span 
class="pl-pds">"</span>./warehouse<span 
class="pl-pds">"</span></span>).getCanonicalPath
+ <span class="pl-k">val</span> <span class="pl-en">metastore</span> <span 
class="pl-k">=</span> <span class="pl-k">new</span> <span 
class="pl-en">File</span>(<span class="pl-s"><span 
class="pl-pds">"</span>./metastore<span 
class="pl-pds">"</span></span>).getCanonicalPath
+ 
+ <span class="pl-k">val</span> <span class="pl-en">spark</span> <span 
class="pl-k">=</span> <span class="pl-en">SparkSession</span>
+   .builder()
+   .master(<span class="pl-s"><span class="pl-pds">"</span>local<span 
class="pl-pds">"</span></span>)
+   .appName(<span class="pl-s"><span class="pl-pds">"</span>StreamExample<span 
class="pl-pds">"</span></span>)
+   .config(<span class="pl-s"><span 
class="pl-pds">"</span>spark.sql.warehouse.dir<span 
class="pl-pds">"</span></span>, warehouse)
+   .getOrCreateCarbonSession(warehouse, metastore)
+
+ spark.sparkContext.setLogLevel(<span class="pl-s"><span 
class="pl-pds">"</span>ERROR<span class="pl-pds">"</span></span>)
+
+ <span class="pl-c"><span class="pl-c">//</span> drop table if exists 
previously</span>
+ spark.sql(s<span class="pl-s"><span class="pl-pds">"</span>DROP TABLE IF 
EXISTS carbon_table<span class="pl-pds">"</span></span>)
+ <span class="pl-c"><span class="pl-c">//</span> Create target carbon table 
and populate with initial data</span>
+ spark.sql(
+   s<span class="pl-s"><span class="pl-pds">"""</span></span>
+<span class="pl-s">      | CREATE TABLE carbon_table (</span>
+<span class="pl-s">      | col1 INT,</span>
+<span class="pl-s">      | col2 STRING</span>
+<span class="pl-s">      | )</span>
+<span class="pl-s">      | STORED BY 'carbondata'</span>
+<span class="pl-s">      | TBLPROPERTIES('streaming'='true')<span 
class="pl-pds">"""</span></span>.stripMargin)
+
+ <span class="pl-k">val</span> <span class="pl-en">carbonTable</span> <span 
class="pl-k">=</span> <span class="pl-en">CarbonEnv</span>.getCarbonTable(<span 
class="pl-en">Some</span>(<span class="pl-s"><span 
class="pl-pds">"</span>default<span class="pl-pds">"</span></span>), <span 
class="pl-s"><span class="pl-pds">"</span>carbon_table<span 
class="pl-pds">"</span></span>)(spark)
+ <span class="pl-k">val</span> <span class="pl-en">tablePath</span> <span 
class="pl-k">=</span> <span 
class="pl-en">CarbonStorePath</span>.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)
+ 
+ <span class="pl-c"><span class="pl-c">//</span> batch load</span>
+ <span class="pl-k">var</span> <span class="pl-en">qry</span><span 
class="pl-k">:</span> <span class="pl-en">StreamingQuery</span> <span 
class="pl-k">=</span> <span class="pl-c1">null</span>
+ <span class="pl-k">val</span> <span class="pl-en">readSocketDF</span> <span 
class="pl-k">=</span> spark.readStream
+   .format(<span class="pl-s"><span class="pl-pds">"</span>socket<span 
class="pl-pds">"</span></span>)
+   .option(<span class="pl-s"><span class="pl-pds">"</span>host<span 
class="pl-pds">"</span></span>, <span class="pl-s"><span 
class="pl-pds">"</span>localhost<span class="pl-pds">"</span></span>)
+   .option(<span class="pl-s"><span class="pl-pds">"</span>port<span 
class="pl-pds">"</span></span>, <span class="pl-c1">9099</span>)
+   .load()
+
+ <span class="pl-c"><span class="pl-c">//</span> Write data from socket stream 
to carbondata file</span>
+ qry <span class="pl-k">=</span> readSocketDF.writeStream
+   .format(<span class="pl-s"><span class="pl-pds">"</span>carbondata<span 
class="pl-pds">"</span></span>)
+   .trigger(<span class="pl-en">ProcessingTime</span>(<span class="pl-s"><span 
class="pl-pds">"</span>5 seconds<span class="pl-pds">"</span></span>))
+   .option(<span class="pl-s"><span 
class="pl-pds">"</span>checkpointLocation<span class="pl-pds">"</span></span>, 
tablePath.getStreamingCheckpointDir)
+   .option(<span class="pl-s"><span class="pl-pds">"</span>dbName<span 
class="pl-pds">"</span></span>, <span class="pl-s"><span 
class="pl-pds">"</span>default<span class="pl-pds">"</span></span>)
+   .option(<span class="pl-s"><span class="pl-pds">"</span>tableName<span 
class="pl-pds">"</span></span>, <span class="pl-s"><span 
class="pl-pds">"</span>carbon_table<span class="pl-pds">"</span></span>)
+   .start()
+
+ <span class="pl-c"><span class="pl-c">//</span> start new thread to show 
data</span>
+ <span class="pl-k">new</span> <span class="pl-en">Thread</span>() {
+   <span class="pl-k">override</span> <span class="pl-k">def</span> <span 
class="pl-en">run</span>()<span class="pl-k">:</span> <span 
class="pl-k">Unit</span> <span class="pl-k">=</span> {
+     <span class="pl-k">do</span> {
+       spark.sql(<span class="pl-s"><span class="pl-pds">"</span>select * from 
carbon_table<span class="pl-pds">"</span></span>).show(<span 
class="pl-c1">false</span>)
+       <span class="pl-en">Thread</span>.sleep(<span 
class="pl-c1">10000</span>)
+     } <span class="pl-k">while</span> (<span class="pl-c1">true</span>)
+   }
+ }.start()
+
+ qry.awaitTermination()</pre></div>
+<p>Continue to type some rows into data server, and spark-shell will show the 
new data of the table.</p>
+<h2>
+<a id="create-table-with-streaming-property" class="anchor" 
href="#create-table-with-streaming-property" aria-hidden="true"><span 
aria-hidden="true" class="octicon octicon-link"></span></a>Create table with 
streaming property</h2>
+<p>Streaming table is just a normal carbon table with "streaming" table 
property, user can create
+streaming table using following DDL.</p>
+<div class="highlight highlight-source-sql"><pre> <span 
class="pl-k">CREATE</span> <span class="pl-k">TABLE</span> <span 
class="pl-en">streaming_table</span> (
+  col1 <span class="pl-k">INT</span>,
+  col2 STRING
+ )
+ STORED BY <span class="pl-s"><span class="pl-pds">'</span>carbondata<span 
class="pl-pds">'</span></span>
+ TBLPROPERTIES(<span class="pl-s"><span class="pl-pds">'</span>streaming<span 
class="pl-pds">'</span></span><span class="pl-k">=</span><span 
class="pl-s"><span class="pl-pds">'</span>true<span 
class="pl-pds">'</span></span>)</pre></div>
+<table>
+<thead>
+<tr>
+<th>property name</th>
+<th>default</th>
+<th>description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>streaming</td>
+<td>false</td>
+<td>Whether to enable streaming ingest feature for this table <br> Value 
range: true, false</td>
+</tr>
+</tbody>
+</table>
+<p>"DESC FORMATTED" command will show streaming property.</p>
+<div class="highlight highlight-source-sql"><pre><span 
class="pl-k">DESC</span> FORMATTED streaming_table</pre></div>
+<h2>
+<a id="alter-streaming-property" class="anchor" 
href="#alter-streaming-property" aria-hidden="true"><span aria-hidden="true" 
class="octicon octicon-link"></span></a>Alter streaming property</h2>
+<p>For an old table, use ALTER TABLE command to set the streaming property.</p>
+<div class="highlight highlight-source-sql"><pre><span 
class="pl-k">ALTER</span> <span class="pl-k">TABLE</span> streaming_table <span 
class="pl-k">SET</span> TBLPROPERTIES(<span class="pl-s"><span 
class="pl-pds">'</span>streaming<span class="pl-pds">'</span></span><span 
class="pl-k">=</span><span class="pl-s"><span class="pl-pds">'</span>true<span 
class="pl-pds">'</span></span>)</pre></div>
+<h2>
+<a id="acquire-streaming-lock" class="anchor" href="#acquire-streaming-lock" 
aria-hidden="true"><span aria-hidden="true" class="octicon 
octicon-link"></span></a>Acquire streaming lock</h2>
+<p>At the begin of streaming ingestion, the system will try to acquire the 
table level lock of streaming.lock file. If the system isn't able to acquire 
the lock of this table, it will throw an InterruptedException.</p>
+<h2>
+<a id="create-streaming-segment" class="anchor" 
href="#create-streaming-segment" aria-hidden="true"><span aria-hidden="true" 
class="octicon octicon-link"></span></a>Create streaming segment</h2>
+<p>The input data of streaming will be ingested into a segment of the 
CarbonData table, the status of this segment is streaming. CarbonData call it a 
streaming segment. The "tablestatus" file will record the segment status and 
data size. The user can use ?SHOW SEGMENTS FOR TABLE tableName? to check 
segment status.</p>
+<p>After the streaming segment reaches the max size, CarbonData will change 
the segment status to "streaming finish" from "streaming", and create new 
"streaming" segment to continue to ingest streaming data.</p>
+<table>
+<thead>
+<tr>
+<th>option</th>
+<th>default</th>
+<th>description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.streaming.segment.max.size</td>
+<td>1024000000</td>
+<td>Unit: byte <br>max size of streaming segment</td>
+</tr>
+</tbody>
+</table>
+<table>
+<thead>
+<tr>
+<th>segment status</th>
+<th>description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>streaming</td>
+<td>The segment is running streaming ingestion</td>
+</tr>
+<tr>
+<td>streaming finish</td>
+<td>The segment already finished streaming ingestion, <br> it will be handed 
off to a segment in the columnar format</td>
+</tr>
+</tbody>
+</table>
+<h2>
+<a id="change-segment-status" class="anchor" href="#change-segment-status" 
aria-hidden="true"><span aria-hidden="true" class="octicon 
octicon-link"></span></a>Change segment status</h2>
+<p>Use below command to change the status of "streaming" segment to "streaming 
finish" segment.</p>
+<div class="highlight highlight-source-sql"><pre><span 
class="pl-k">ALTER</span> <span class="pl-k">TABLE</span> streaming_table 
FINISH STREAMING</pre></div>
+<h2>
+<a id="handoff-streaming-finish-segment-to-columnar-segment" class="anchor" 
href="#handoff-streaming-finish-segment-to-columnar-segment" 
aria-hidden="true"><span aria-hidden="true" class="octicon 
octicon-link"></span></a>Handoff "streaming finish" segment to columnar 
segment</h2>
+<p>Use below command to handoff "streaming finish" segment to columnar format 
segment manually.</p>
+<div class="highlight highlight-source-sql"><pre><span 
class="pl-k">ALTER</span> <span class="pl-k">TABLE</span> streaming_table 
COMPACT <span class="pl-s"><span class="pl-pds">'</span>streaming<span 
class="pl-pds">'</span></span>
+</pre></div>
+<h2>
+<a id="auto-handoff-streaming-segment" class="anchor" 
href="#auto-handoff-streaming-segment" aria-hidden="true"><span 
aria-hidden="true" class="octicon octicon-link"></span></a>Auto handoff 
streaming segment</h2>
+<p>Config the property "carbon.streaming.auto.handoff.enabled" to auto handoff 
streaming segment. If the value of this property is true, after the streaming 
segment reaches the max size, CarbonData will change this segment to "streaming 
finish" status and trigger to auto handoff this segment to columnar format 
segment in a new thread.</p>
+<table>
+<thead>
+<tr>
+<th>property name</th>
+<th>default</th>
+<th>description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.streaming.auto.handoff.enabled</td>
+<td>true</td>
+<td>whether to auto trigger handoff operation</td>
+</tr>
+</tbody>
+</table>
+<h2>
+<a id="close-streaming-table" class="anchor" href="#close-streaming-table" 
aria-hidden="true"><span aria-hidden="true" class="octicon 
octicon-link"></span></a>Close streaming table</h2>
+<p>Use below command to handoff all streaming segments to columnar format 
segments and modify the streaming property to false, this table becomes a 
normal table.</p>
+<div class="highlight highlight-source-sql"><pre><span 
class="pl-k">ALTER</span> <span class="pl-k">TABLE</span> streaming_table 
COMPACT <span class="pl-s"><span class="pl-pds">'</span>close_streaming<span 
class="pl-pds">'</span></span>
+</pre></div>
+<h2>
+<a id="constraint" class="anchor" href="#constraint" aria-hidden="true"><span 
aria-hidden="true" class="octicon octicon-link"></span></a>Constraint</h2>
+<ol>
+<li>reject set streaming property from true to false.</li>
+<li>reject UPDATE/DELETE command on the streaming table.</li>
+<li>reject create pre-aggregation DataMap on the streaming table.</li>
+<li>reject add the streaming property on the table with pre-aggregation 
DataMap.</li>
+<li>if the table has dictionary columns, it will not support concurrent data 
loading.</li>
+<li>block delete "streaming" segment while the streaming ingestion is 
running.</li>
+<li>block drop the streaming table while the streaming ingestion is 
running.</li>
+</ol>
+</div>
+</div>
+</div>
+</div>
+<div class="doc-footer">
+    <a href="#top" class="scroll-top">Top</a>
+</div>
+</div>
+</section>
+</div>
+</div>
+</div>
+</section><!-- End systemblock part -->
+<script src="js/custom.js"></script>
+</body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/main/webapp/supported-data-types-in-carbondata.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/supported-data-types-in-carbondata.html 
b/src/main/webapp/supported-data-types-in-carbondata.html
index 51a1216..dd4cbc5 100644
--- a/src/main/webapp/supported-data-types-in-carbondata.html
+++ b/src/main/webapp/supported-data-types-in-carbondata.html
@@ -51,6 +51,9 @@
                            aria-expanded="false"> Download <span 
class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/";
+                                   target="_blank">Apache CarbonData 
1.3.0</a></li>
+                            <li>
                                 <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.2.0/";
                                    target="_blank">Apache CarbonData 
1.2.0</a></li>
                             <li>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/main/webapp/troubleshooting.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/troubleshooting.html 
b/src/main/webapp/troubleshooting.html
index 6129686..3a2e311 100644
--- a/src/main/webapp/troubleshooting.html
+++ b/src/main/webapp/troubleshooting.html
@@ -51,6 +51,9 @@
                            aria-expanded="false"> Download <span 
class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/";
+                                   target="_blank">Apache CarbonData 
1.3.0</a></li>
+                            <li>
                                 <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.2.0/";
                                    target="_blank">Apache CarbonData 
1.2.0</a></li>
                             <li>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/main/webapp/useful-tips-on-carbondata.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/useful-tips-on-carbondata.html 
b/src/main/webapp/useful-tips-on-carbondata.html
index 3d36da8..cb19036 100644
--- a/src/main/webapp/useful-tips-on-carbondata.html
+++ b/src/main/webapp/useful-tips-on-carbondata.html
@@ -51,6 +51,9 @@
                            aria-expanded="false"> Download <span 
class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/";
+                                   target="_blank">Apache CarbonData 
1.3.0</a></li>
+                            <li>
                                 <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.2.0/";
                                    target="_blank">Apache CarbonData 
1.2.0</a></li>
                             <li>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/main/webapp/videogallery.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/videogallery.html 
b/src/main/webapp/videogallery.html
index 154aad2..bfa5a49 100644
--- a/src/main/webapp/videogallery.html
+++ b/src/main/webapp/videogallery.html
@@ -49,6 +49,10 @@
                            aria-expanded="false"> Download <span 
class="caret"></span></a>
                         <ul class="dropdown-menu">
                             <li>
+                                <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.3.0/";
+                                   target="_blank">Apache CarbonData 
1.3.0</a></li>
+
+                            <li>
                                 <a 
href="https://dist.apache.org/repos/dist/release/carbondata/1.2.0/";
                                    target="_blank">Apache CarbonData 
1.2.0</a></li>
                             <li>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/site/markdown/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/data-management-on-carbondata.md 
b/src/site/markdown/data-management-on-carbondata.md
index 18ad5b8..c846ffc 100644
--- a/src/site/markdown/data-management-on-carbondata.md
+++ b/src/site/markdown/data-management-on-carbondata.md
@@ -627,21 +627,21 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   
   ```
   LOAD DATA [LOCAL] INPATH 'folder_path' 
-    INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
-    OPTIONS(property_name=property_value, ...)
-  NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT 
STATMENT 
+  INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
+  OPTIONS(property_name=property_value, ...)
+    
+  INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
<SELECT STATMENT>
   ```
   
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'CA')
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
+  PARTITION (country = 'US', state = 'CA')
     
   INSERT INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'AL')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  PARTITION (country = 'US', state = 'AL')
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Load Data Using Dynamic Partition
@@ -650,12 +650,11 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
 
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
           
   INSERT INTO TABLE locationTable
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Show Partitions
@@ -679,19 +678,19 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   
   ```
    INSERT OVERWRITE TABLE table_name
-    PARTITION (column = 'partition_name')
-    select_statement
+   PARTITION (column = 'partition_name')
+   select_statement
   ```
   
   Example:
   ```
   INSERT OVERWRITE TABLE partitioned_user
-    PARTITION (country = 'US')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US';
+  PARTITION (country = 'US')
+  SELECT * FROM another_user au 
+  WHERE au.country = 'US';
   ```
 
-### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not 
supports update and delete data.
+### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition 
feature does not support update and delete data.
 
   The partition supports three type:(Hash,Range,List), similar to other 
system's partition features, CarbonData's partition feature can be used to 
improve query performance by filtering on the partition column.
 
@@ -886,11 +885,11 @@ will be transformed by Query Planner to fetch data from 
pre-aggregate table **ag
 
 But queries of kind
 ```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY 
country, sex
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY 
user_id, country, sex
 
 SELECT sex, avg(quantity) from sales GROUP BY sex
 
-SELECT max(price), country from sales GROUP BY country
+SELECT country, max(price) from sales GROUP BY country
 ```
 
 will fetch the data from the main table **sales**
@@ -910,18 +909,13 @@ pre-aggregate tables satisfy the query condition, the 
plan is transformed automa
 pre-aggregate table to fetch the data
 
 ##### Compacting pre-aggregate tables
-Compaction is an optional operation for pre-aggregate table. If compaction is 
performed on main 
-table but not performed on pre-aggregate table, all queries still can benefit 
from pre-aggregate 
-table.To further improve performance on pre-aggregate table, compaction can be 
triggered on 
-pre-aggregate tables directly, it will merge the segments inside 
pre-aggregation table. 
-To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just 
like the main table
+Compaction command (ALTER TABLE COMPACT) need to be run separately on each 
pre-aggregate table.
+Running Compaction command on main table will **not automatically** compact 
the pre-aggregate 
+tables.Compaction is an optional operation for pre-aggregate table. If 
compaction is performed on
+main table but not performed on pre-aggregate table, all queries still can 
benefit from 
+pre-aggregate tables.To further improve performance on pre-aggregate tables, 
compaction can be 
+triggered on pre-aggregate tables directly, it will merge the segments inside 
pre-aggregate table. 
 
-  NOTE:
-  * If the aggregate function used in the pre-aggregate table creation 
included distinct-count,
-     during compaction, the pre-aggregate table values are recomputed.This 
would a costly 
-     operation as compared to the compaction of pre-aggregate tables 
containing other aggregate 
-     functions alone
- 
 ##### Update/Delete Operations on pre-aggregate tables
 This functionality is not supported.
 
@@ -1005,16 +999,6 @@ roll-up for the queries on these hierarchies.
   ) AS
   SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
    avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_timeâ=âorder_timeâ,
-  'minute_granualrityâ=â1â,
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), 
count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
   ```
   
   For Querying data and automatically roll-up to the desired aggregation 
level,Carbondata supports 
@@ -1028,9 +1012,7 @@ roll-up for the queries on these hierarchies.
   ```
   
   It is **not necessary** to create pre-aggregate tables for each granularity 
unless required for 
-  query
-  .Carbondata
-   can roll-up the data and fetch it
+  query.Carbondata can roll-up the data and fetch it.
    
   For Example: For main table **sales** , If pre-aggregate tables were created 
as  
   

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/site/markdown/streaming-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/streaming-guide.md 
b/src/site/markdown/streaming-guide.md
new file mode 100644
index 0000000..201f8e0
--- /dev/null
+++ b/src/site/markdown/streaming-guide.md
@@ -0,0 +1,169 @@
+# CarbonData Streaming Ingestion
+
+## Quick example
+Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
+
+Package carbon jar, and copy 
assembly/target/scala-2.11/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.7.2.jar 
to $SPARK_HOME/jars
+```shell
+mvn clean package -DskipTests -Pspark-2.2
+```
+
+Start a socket data server in a terminal
+```shell
+ nc -lk 9099
+```
+ type some CSV rows as following
+```csv
+1,col1
+2,col2
+3,col3
+4,col4
+5,col5
+```
+
+Start spark-shell in new terminal, type :paste, then copy and run the 
following code.
+```scala
+ import java.io.File
+ import org.apache.spark.sql.{CarbonEnv, SparkSession}
+ import org.apache.spark.sql.CarbonSession._
+ import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}
+ import org.apache.carbondata.core.util.path.CarbonStorePath
+ 
+ val warehouse = new File("./warehouse").getCanonicalPath
+ val metastore = new File("./metastore").getCanonicalPath
+ 
+ val spark = SparkSession
+   .builder()
+   .master("local")
+   .appName("StreamExample")
+   .config("spark.sql.warehouse.dir", warehouse)
+   .getOrCreateCarbonSession(warehouse, metastore)
+
+ spark.sparkContext.setLogLevel("ERROR")
+
+ // drop table if exists previously
+ spark.sql(s"DROP TABLE IF EXISTS carbon_table")
+ // Create target carbon table and populate with initial data
+ spark.sql(
+   s"""
+      | CREATE TABLE carbon_table (
+      | col1 INT,
+      | col2 STRING
+      | )
+      | STORED BY 'carbondata'
+      | TBLPROPERTIES('streaming'='true')""".stripMargin)
+
+ val carbonTable = CarbonEnv.getCarbonTable(Some("default"), 
"carbon_table")(spark)
+ val tablePath = 
CarbonStorePath.getCarbonTablePath(carbonTable.getAbsoluteTableIdentifier)
+ 
+ // batch load
+ var qry: StreamingQuery = null
+ val readSocketDF = spark.readStream
+   .format("socket")
+   .option("host", "localhost")
+   .option("port", 9099)
+   .load()
+
+ // Write data from socket stream to carbondata file
+ qry = readSocketDF.writeStream
+   .format("carbondata")
+   .trigger(ProcessingTime("5 seconds"))
+   .option("checkpointLocation", tablePath.getStreamingCheckpointDir)
+   .option("dbName", "default")
+   .option("tableName", "carbon_table")
+   .start()
+
+ // start new thread to show data
+ new Thread() {
+   override def run(): Unit = {
+     do {
+       spark.sql("select * from carbon_table").show(false)
+       Thread.sleep(10000)
+     } while (true)
+   }
+ }.start()
+
+ qry.awaitTermination()
+```
+
+Continue to type some rows into data server, and spark-shell will show the new 
data of the table.
+
+## Create table with streaming property
+Streaming table is just a normal carbon table with "streaming" table property, 
user can create
+streaming table using following DDL.
+```sql
+ CREATE TABLE streaming_table (
+  col1 INT,
+  col2 STRING
+ )
+ STORED BY 'carbondata'
+ TBLPROPERTIES('streaming'='true')
+```
+
+ property name | default | description
+ ---|---|--- 
+ streaming | false |Whether to enable streaming ingest feature for this table 
<br /> Value range: true, false 
+ 
+ "DESC FORMATTED" command will show streaming property.
+ ```sql
+ DESC FORMATTED streaming_table
+ ```
+ 
+## Alter streaming property
+For an old table, use ALTER TABLE command to set the streaming property.
+```sql
+ALTER TABLE streaming_table SET TBLPROPERTIES('streaming'='true')
+```
+
+## Acquire streaming lock
+At the begin of streaming ingestion, the system will try to acquire the table 
level lock of streaming.lock file. If the system isn't able to acquire the lock 
of this table, it will throw an InterruptedException.
+
+## Create streaming segment
+The input data of streaming will be ingested into a segment of the CarbonData 
table, the status of this segment is streaming. CarbonData call it a streaming 
segment. The "tablestatus" file will record the segment status and data size. 
The user can use âSHOW SEGMENTS FOR TABLE tableNameâ to check segment 
status. 
+
+After the streaming segment reaches the max size, CarbonData will change the 
segment status to "streaming finish" from "streaming", and create new 
"streaming" segment to continue to ingest streaming data.
+
+option | default | description
+--- | --- | ---
+carbon.streaming.segment.max.size | 1024000000 | Unit: byte <br />max size of 
streaming segment
+
+segment status | description
+--- | ---
+streaming | The segment is running streaming ingestion
+streaming finish | The segment already finished streaming ingestion, <br /> it 
will be handed off to a segment in the columnar format
+
+## Change segment status
+Use below command to change the status of "streaming" segment to "streaming 
finish" segment.
+```sql
+ALTER TABLE streaming_table FINISH STREAMING
+```
+
+## Handoff "streaming finish" segment to columnar segment
+Use below command to handoff "streaming finish" segment to columnar format 
segment manually.
+```sql
+ALTER TABLE streaming_table COMPACT 'streaming'
+
+```
+
+## Auto handoff streaming segment
+Config the property "carbon.streaming.auto.handoff.enabled" to auto handoff 
streaming segment. If the value of this property is true, after the streaming 
segment reaches the max size, CarbonData will change this segment to "streaming 
finish" status and trigger to auto handoff this segment to columnar format 
segment in a new thread.
+
+property name | default | description
+--- | --- | ---
+carbon.streaming.auto.handoff.enabled | true | whether to auto trigger handoff 
operation
+
+## Close streaming table
+Use below command to handoff all streaming segments to columnar format 
segments and modify the streaming property to false, this table becomes a 
normal table.
+```sql
+ALTER TABLE streaming_table COMPACT 'close_streaming'
+
+```
+
+## Constraint
+1. reject set streaming property from true to false.
+2. reject UPDATE/DELETE command on the streaming table.
+3. reject create pre-aggregation DataMap on the streaming table.
+4. reject add the streaming property on the table with pre-aggregation DataMap.
+5. if the table has dictionary columns, it will not support concurrent data 
loading.
+6. block delete "streaming" segment while the streaming ingestion is running.
+7. block drop the streaming table while the streaming ingestion is running.

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/4a4aad2b/src/site/pdf.xml
----------------------------------------------------------------------
diff --git a/src/site/pdf.xml b/src/site/pdf.xml
index d1dd8e9..2fe03a6 100644
--- a/src/site/pdf.xml
+++ b/src/site/pdf.xml
@@ -15,6 +15,7 @@
       <item name="CarbonData File Structure" 
ref='file-structure-of-carbondata.md'/>
       <item name="Installation" ref='installation-guide.md'/>
     <item name="Configuring CarbonData" ref='configuration-parameters.md'/>
+    <item name="Streaming" ref='streaming-guide.md'/>
     <item name="FAQs" ref='faq.md'/>
     <item name="Troubleshooting" ref='troubleshooting.md'/>
     <item name="Useful Tips" ref='useful-tips-on-carbondata.md'/>
@@ -25,7 +26,7 @@
     <companyLogo>../../src/site/projectLogo/ApacheLogo.png</companyLogo>
     <projectLogo>../../src/site/projectLogo/CarbonDataLogo.png</projectLogo>
     <coverTitle>Apache CarbonData</coverTitle>
-    <coverSubTitle>Ver 1.0 </coverSubTitle>
+    <coverSubTitle>Ver 1.3.0 </coverSubTitle>
     <coverType>Documentation</coverType>
     <projectName>Apache CarbonData</projectName>
     <companyName>The Apache Software Foundation</companyName>

[1/3] carbondata-site git commit: Changes for release 1.3.0

Reply via email to