[20/54] incubator-carbondata-site git commit: Wip for Automating Documentation for Website

chenliang613 Wed, 12 Apr 2017 04:51:48 -0700

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/content/docs/latest/useful-tips-on-carbondata.html
----------------------------------------------------------------------
diff --git a/content/docs/latest/useful-tips-on-carbondata.html 
b/content/docs/latest/useful-tips-on-carbondata.html
deleted file mode 100644
index f4f9cc0..0000000
--- a/content/docs/latest/useful-tips-on-carbondata.html
+++ /dev/null
@@ -1,209 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
---><h1>Useful Tips</h1><p>This tutorial guides you to create CarbonData Tables 
and optimize performance. The following sections will elaborate on the above 
topics :</p>
-<ul>
-  <li><a href="#carbondata-table">Suggestions to create CarbonData 
Table</a></li>
-  <li><a href="#carbondata-performance">Configurations For Optimizing 
CarbonData Performance</a></li>
-</ul><h2 id="carbondata-table">Suggestions to Create CarbonData 
Table</h2><p>Recently CarbonData was used to analyze performance of 
Telecommunication field. The results of the analysis for table creation with 
dimensions ranging from 10 thousand to 10 billion rows and 100 to 300 columns 
have been summarized below. </p><p>The following table describes some of the 
columns from the table used.</p><p><strong>Table Column Description</strong></p>
-<table class="table table-striped table-bordered">
-  <thead>
-    <tr>
-      <th>Column Name </th>
-      <th>Data Type </th>
-      <th>Cardinality </th>
-      <th>Attribution </th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>msisdn </td>
-      <td>String </td>
-      <td>30 million </td>
-      <td>Dimension </td>
-    </tr>
-    <tr>
-      <td>BEGIN_TIME </td>
-      <td>BigInt </td>
-      <td>10 Thousand </td>
-      <td>Dimension </td>
-    </tr>
-    <tr>
-      <td>HOST </td>
-      <td>String </td>
-      <td>1 million </td>
-      <td>Dimension </td>
-    </tr>
-    <tr>
-      <td>Dime_1 </td>
-      <td>String </td>
-      <td>1 Thousand </td>
-      <td>Dimension </td>
-    </tr>
-    <tr>
-      <td>counter_1 </td>
-      <td>Numeric(20,0) </td>
-      <td>NA </td>
-      <td>Measure </td>
-    </tr>
-    <tr>
-      <td>... </td>
-      <td>... </td>
-      <td>NA </td>
-      <td>Measure </td>
-    </tr>
-    <tr>
-      <td>counter_100 </td>
-      <td>Numeric(20,0) </td>
-      <td>NA </td>
-      <td>Measure </td>
-    </tr>
-  </tbody>
-</table><p>CarbonData has more than 50 test cases, on the basis of these we 
have following suggestions to enhance the query performance :</p>
-<ul>
-  <li><strong>Put the frequently-used column filter in the 
beginning</strong></li>
-</ul><p>For example, MSISDN filter is used in most of the query then we must 
put the MSISDN in the first column. The create table command can be modified as 
suggested below :</p><p><code>
-  create table carbondata_table(
-  msisdn String,
-  ...
-  )STORED BY &#39;org.apache.carbondata.format&#39; 
-  TBLPROPERTIES ( &#39;DICTIONARY_EXCLUDE&#39;=&#39;MSISDN,..&#39;,
-  &#39;DICTIONARY_INCLUDE&#39;=&#39;...&#39;);
-</code></p><p>Now the query with MSISDN in the filter will be more 
efficient.</p>
-<ul>
-  <li><strong>Put the frequently-used columns in the order of low to high 
cardinality</strong></li>
-</ul><p>If the table in the specified query has multiple columns which are 
frequently used to filter the results, it is suggested to put  the columns in 
the order of cardinality low to high. This ordering of frequently used columns 
improves the compression ratio and  enhances the performance of queries with 
filter on these columns.</p><p>For example if MSISDN, HOST and Dime_1 are 
frequently-used columns, then the column order of table is suggested as  
Dime_1&gt;HOST&gt;MSISDN as Dime_1 has the lowest cardinality.  The create 
table command can be modified as suggested below :</p><p><code>
-  create table carbondata_table(
-  Dime_1 String,
-  HOST String,
-  MSISDN String,
-  ...
-  )STORED BY &#39;org.apache.carbondata.format&#39; 
-  TBLPROPERTIES ( &#39;DICTIONARY_EXCLUDE&#39;=&#39;MSISDN,HOST..&#39;,
-  &#39;DICTIONARY_INCLUDE&#39;=&#39;Dime_1..&#39;);
-</code></p>
-<ul>
-  <li><strong>Put the Dimension type columns in order of low to high 
cardinality</strong></li>
-</ul><p>If the columns used to filter are not frequently used, then it is 
suggested to order all the columns of dimension type in order of low to high 
cardinality. The create table command can be modified as below :</p><p><code>
-  create table carbondata_table(
-  Dime_1 String,
-  BEGIN_TIME bigint
-  HOST String,
-  MSISDN String,
-  ...
-  )STORED BY &#39;org.apache.carbondata.format&#39; 
-  TBLPROPERTIES ( &#39;DICTIONARY_EXCLUDE&#39;=&#39;MSISDN,HOST,IMSI..&#39;,
-  &#39;DICTIONARY_INCLUDE&#39;=&#39;Dime_1,END_TIME,BEGIN_TIME..&#39;);
-</code></p>
-<ul>
-  <li><strong>For measure type columns with non high accuracy, replace 
Numeric(20,0) data type with Double data type</strong></li>
-</ul><p>For columns of measure type, not requiring high accuracy, it is 
suggested to replace Numeric data type with Double to enhance query 
performance. The create table command can be modified as below :</p><p><code>
-  create table carbondata_table(
-  Dime_1 String,
-  BEGIN_TIME bigint
-  HOST String,
-  MSISDN String,
-  counter_1 double,
-  counter_2 double,
-  ...
-  counter_100 double
-  )STORED BY &#39;org.apache.carbondata.format&#39; 
-  TBLPROPERTIES ( &#39;DICTIONARY_EXCLUDE&#39;=&#39;MSISDN,HOST,IMSI&#39;,
-  &#39;DICTIONARY_INCLUDE&#39;=&#39;Dime_1,END_TIME,BEGIN_TIME&#39;);
-</code>  The result of performance analysis of test-case shows reduction in 
query execution time from 15 to 3 seconds, thereby improving performance by 
nearly 5 times.</p>
-<ul>
-  <li><strong>Columns of incremental character should be re-arranged at the 
end of dimensions</strong></li>
-</ul><p>Consider the following scenario where data is loaded each day and the 
start_time is incremental for each load, it is suggested to put start_time at 
the end of dimensions. </p><p>Incremental values are efficient in using min/max 
index. The create table command can be modified as below :</p><p><code>
-  create table carbondata_table(
-  Dime_1 String,
-  HOST String,
-  MSISDN String,
-  counter_1 double,
-  counter_2 double,
-  BEGIN_TIME bigint,
-  ...
-  counter_100 double
-  )STORED BY &#39;org.apache.carbondata.format&#39; 
-  TBLPROPERTIES ( &#39;DICTIONARY_EXCLUDE&#39;=&#39;MSISDN,HOST,IMSI&#39;,
-  &#39;DICTIONARY_INCLUDE&#39;=&#39;Dime_1,END_TIME,BEGIN_TIME&#39;); 
-</code></p>
-<ul>
-  <li><strong>Avoid adding high cardinality columns to dictionary</strong></li>
-</ul><p>If the system has low memory configuration, then it is suggested to 
exclude high cardinality columns from the dictionary to enhance load 
performance. Creation of dictionary for high cardinality columns at time of 
load will degrade load performance due to excessive memory usage. </p><p>By 
default CarbonData determines the cardinality at the first data load and allows 
for dictionary creation only if the cardinality is less than 1 million.</p>
-<h2 id="carbondata-performance">Configurations for Optimizing CarbonData 
Performance</h2><p>Recently we did some performance POC on CarbonData for 
Finance and telecommunication Field. It involved detailed queries and 
aggregation scenarios. After the completion of POC, some of the configurations 
impacting the performance have been identified and tabulated below :</p>
-<table class="table table-striped table-bordered">
-  <thead>
-    <tr>
-      <th>Parameter </th>
-      <th>Location </th>
-      <th>Used For </th>
-      <th>Description </th>
-      <th>Tuning </th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>carbon.sort.intermediate.files.limit </td>
-      <td>spark/carbonlib/carbon.properties </td>
-      <td>Data loading </td>
-      <td>During the loading of data, local temp is used to sort the data. 
This number specifies the minimum number of intermediate files after which the 
merge sort has to be initiated. </td>
-      <td>Increasing the parameter to a higher value will improve the load 
performance. For example, when we increase the value from 20 to 100, it 
increases the data load performance from 35MB/S to more than 50MB/S. Higher 
values of this parameter consumes more memory during the load. </td>
-    </tr>
-    <tr>
-      <td>carbon.number.of.cores.while.loading </td>
-      <td>spark/carbonlib/carbon.properties </td>
-      <td>Data loading </td>
-      <td>Specifies the number of cores used for data processing during data 
loading in CarbonData. </td>
-      <td>If you have more number of CPUs, then you can increase the number of 
CPUs, which will increase the performance. For example if we increase the value 
from 2 to 4 then the CSV reading performance can increase about 1 times </td>
-    </tr>
-    <tr>
-      <td>carbon.compaction.level.threshold </td>
-      <td>spark/carbonlib/carbon.properties </td>
-      <td>Data loading and Querying </td>
-      <td>For minor compaction, specifies the number of segments to be merged 
in stage 1 and number of compacted segments to be merged in stage 2. </td>
-      <td>Each CarbonData load will create one segment, if every load is small 
in size it will generate many small file over a period of time impacting the 
query performance. Configuring this parameter will merge the small segment to 
one big segment which will sort the data and improve the performance. For 
Example in one telecommunication scenario, the performance improves about 2 
times after minor compaction. </td>
-    </tr>
-    <tr>
-      <td>spark.sql.shuffle.partitions </td>
-      <td>spark/con/spark-defaults.conf </td>
-      <td>Querying </td>
-      <td>The number of task started when spark shuffle. </td>
-      <td>The value can be 1 to 2 times as much as the executor cores. In an 
aggregation scenario, reducing the number from 200 to 32 reduced the query time 
from 17 to 9 seconds. </td>
-    </tr>
-    <tr>
-      <td>num-executors/executor-cores/executor-memory </td>
-      <td>spark/con/spark-defaults.conf </td>
-      <td>Querying </td>
-      <td>The number of executors, CPU cores, and memory used for CarbonData 
query. </td>
-      <td>In the bank scenario, we provide the 4 CPUs cores and 15 GB for each 
executor which can get good performance. This 2 value does not mean more the 
better. It needs to be configured properly in case of limited resources. For 
example, In the bank scenario, it has enough CPU 32 cores each node but less 
memory 64 GB each node. So we cannot give more CPU but less memory. For 
example, when 4 cores and 12GB for each executor. It sometimes happens GC 
during the query which impact the query performance very much from the 3 second 
to more than 15 seconds. In this scenario need to increase the memory or 
decrease the CPU cores. </td>
-    </tr>
-    <tr>
-      <td>carbon.detail.batch.size </td>
-      <td>spark/carbonlib/carbon.properties </td>
-      <td>Data loading </td>
-      <td>The buffer size to store records, returned from the block scan. </td>
-      <td>In limit scenario this parameter is very important. For example your 
query limit is 1000. But if we set this value to 3000 that means we get 3000 
records from scan but spark will only take 1000 rows. So the 2000 remaining are 
useless. In one Finance test case after we set it to 100, in the limit 1000 
scenario the performance increase about 2 times in comparison to if we set this 
value to 12000. </td>
-    </tr>
-    <tr>
-      <td>carbon.use.local.dir </td>
-      <td>spark/carbonlib/carbon.properties </td>
-      <td>Data loading </td>
-      <td>Whether use YARN local directories for multi-table load disk load 
balance </td>
-      <td>If this is set it to true CarbonData will use YARN local directories 
for multi-table load disk load balance, that will improve the data load 
performance. </td>
-    </tr>
-  </tbody>
-</table>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/content/docs/latest/user-guide-toc.html
----------------------------------------------------------------------
diff --git a/content/docs/latest/user-guide-toc.html 
b/content/docs/latest/user-guide-toc.html
deleted file mode 100644
index 790135a..0000000
--- a/content/docs/latest/user-guide-toc.html
+++ /dev/null
@@ -1,50 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License atin  
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
---><h1>User Guide</h1><p>Welcome to Apache CarbonData. Apache 
CarbonData(incubating) is a new big data file format for faster interactive 
query using advanced columnar storage, index, compression and encoding 
techniques to improve computing efficiency, which helps in speeding up queries 
by an order of magnitude faster over PetaBytes of data. This user guide 
provides a detailed description about the CarbonData and its 
features.</p><p>Let's get started !</p>
-<ul>
-  <li><a href="quick-start-guide.html">Quick Start</a></li>
-  <li><a href="overview-of-carbondata.html">Overview</a>
-  <ul>
-    <li><a href="overview-of-carbondata.html">Introduction</a></li>
-    <li><a href="overview-of-carbondata.html">Features</a></li>
-    <li><a href="supported-data-types-in-carbondata.html">Data Types</a></li>
-    <li><a href="file-structure-of-carbondata.html">CarbonData File 
Structure</a></li>
-  </ul></li>
-  <li><a href="installation-guide.html">Installation Guide</a>
-  <ul>
-    <li><a href="installation-guide.html">Installing and Configuring 
CarbonData on Standalone Spark Cluster</a></li>
-    <li><a href="installation-guide.html">Installing and Configuring 
CarbonData on "Spark on YARN Cluster</a></li>
-  </ul></li>
-  <li><a href="configuration-parameters.html">Configuring CarbonData</a>
-  <ul>
-    <li><a href="configuration-parameters.html">System Configuration</a></li>
-    <li><a href="configuration-parameters.html">Performance 
Configuration</a></li>
-    <li><a href="configuration-parameters.html">Miscellaneous 
Configuration</a></li>
-    <li><a href="configuration-parameters.html">Spark Configuration</a></li>
-  </ul></li>
-  <li><a href="using-carbondata.html">Using CarbonData</a>
-  <ul>
-    <li><a href="data-management.html">Data Management</a></li>
-    <li><a href="ddl-operation-on-carbondata.html">DDL Operations on 
CarbonData</a></li>
-    <li><a href="dml-operation-on-carbondata.html">DML Operations on 
CarbonData</a></li>
-  </ul></li>
-  <li><a href="useful-tips-on-carbondata.html">Useful Tips</a> </li>
-  <li><a href="troubleshooting.html">Troubleshooting</a> </li>
-  <li><a href="faq.html">FAQs</a> </li>
-
-</ul>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/content/docs/latest/using-carbondata.html
----------------------------------------------------------------------
diff --git a/content/docs/latest/using-carbondata.html 
b/content/docs/latest/using-carbondata.html
deleted file mode 100644
index 29a3e1c..0000000
--- a/content/docs/latest/using-carbondata.html
+++ /dev/null
@@ -1,33 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-<script src="../../js/mdNavigation.js" type="text/javascript"></script>
-<h1>Using CarbonData</h1><p>This tutorial discusses the disciplines related to 
management of data in Apache CarbonData. Following below each section is a 
brief introduction to respective disciplines related to data 
management.</p><h2>Data Management</h2><p>This section shall be dealing with 
the disciplines related to managing data in the application, focusing on 
conceptual details related to operations like load data, delete data, update 
data and Compacting Data.</p><p>For complete details refer to Data 
Management</p><h2>Data Definition Language Support</h2><p>This section deals 
with the aspects related to creation and modification of the structure of 
database. It shall discuss in detail about</p>
-<ul>
-  <li>Table creation</li>
-  <li>Table deletion</li>
-  <li>Table description</li>
-  <li>Compaction</li>
-</ul><p>For complete details refer to <a 
href="ddl-operation-on-carbondata.html">DDL Operations on 
CarbonData</a></p><h2>Data Manipulation Language Support</h2><p>This section 
deals with the aspects related to data manipulation in database. It shall 
discuss in detail about selecting, loading and deleting in a database. This 
manipulation comprises of</p>
-<ul>
-  <li>Loading data into database tables</li>
-  <li>Retrieving existing data</li>
-  <li>Deleting data from existing tables</li>
-  <li>Deleting segments from existing tables</li>
-  <li>Updating data in existing tables</li>
-</ul><p>For complete details refer to <a 
href="dml-operation-on-carbondata.html">DML Operations on CarbonData</a></p>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/content/fonts/FontAwesome.otf
----------------------------------------------------------------------
diff --git a/content/fonts/FontAwesome.otf b/content/fonts/FontAwesome.otf
deleted file mode 100644
index 401ec0f..0000000
Binary files a/content/fonts/FontAwesome.otf and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/content/fonts/OpenSans-Bold.eot
----------------------------------------------------------------------
diff --git a/content/fonts/OpenSans-Bold.eot b/content/fonts/OpenSans-Bold.eot
deleted file mode 100644
index 016123b..0000000
Binary files a/content/fonts/OpenSans-Bold.eot and /dev/null differ

[20/54] incubator-carbondata-site git commit: Wip for Automating Documentation for Website

Reply via email to