[01/54] incubator-carbondata-site git commit: Wip for Automating Documentation for Website

chenliang613 Wed, 12 Apr 2017 04:52:15 -0700

Repository: incubator-carbondata-site
Updated Branches:
  refs/heads/asf-site bc1361dda -> 99fd49060



http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/release-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/release-guide.md 
b/src/site/markdown/release-guide.md
deleted file mode 100644
index 50a0e8a..0000000
--- a/src/site/markdown/release-guide.md
+++ /dev/null
@@ -1,482 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-# Apache CarbonData Release Guide
-
-Apache CarbonData periodically declares and publishes releases.
-
-Each release is executed by a _Release Manager_, who is selected among the 
CarbonData committers.
- This document describes the process that the Release Manager follows to 
perform a release. Any 
- changes to this process should be discussed and adopted on the 
- [dev@ mailing list](mailto:[email protected]).
- 
-Please remember that publishing software has legal consequences. This guide 
complements the 
-foundation-wide [Product Release 
Policy](http://www.apache.org/dev/release.html) and [Release 
-Distribution Policy](http://www.apache.org/dev/release-distribution).
-
-## Decide to release
-
-Deciding to release and selecting a Release Manager is the first step of the 
release process. 
-This is a consensus-based decision of the entire community.
-
-Anybody can propose a release on the dev@ mailing list, giving a solid 
argument and nominating a 
-committer as the Release Manager (including themselves). There's no formal 
process, no vote 
-requirements, and no timing requirements. Any objections should be resolved by 
consensus before 
-starting the release.
-
-_Checklist to proceed to next step:_
-
-1. Community agrees to release
-2. Community selects a Release Manager
-
-## Prepare for the release
-
-Before your first release, you should perform one-time configuration steps. 
This will set up your
- security keys for signing the artifacts and access release repository.
- 
-To prepare for each release, you should audit the project status in the Jira, 
and do necessary 
-bookkeeping. Finally, you should tag a release.
-
-### One-time setup instructions
-
-#### GPG Key
-
-You need to have a GPG key to sign the release artifacts. Please be aware of 
the ASF-wide 
-[release signing guidelines](https://www.apache.org/dev/release-signing.html). 
If you don't have 
-a GPG key associated with your Apache account, please create one according to 
the guidelines.
-
-Determine your Apache GPG key and key ID, as follows:
-
-```
-gpg --list-keys
-```
-
-This will list your GPG keys. One of these should reflect your Apache account, 
for exemple:
-
-```
-pub   2048R/845E6689 2016-02-23
-uid                  Nomen Nescio <[email protected]>
-sub   2048R/BA4D50BE 2016-02-23
-```
-
-Here, the key ID is the 8-digit hex string in the `pub` line: `845E6689`.
-
-Now, add your Apache GPG key to the CarbonData's `KEYS` file in `dev` and 
`release` repositories 
-at `dist.apache.org`. Follow the instructions listed at the top of these files.
- 
-Configure `git` to use this key when signing code by giving it your key ID, as 
follows:
-
-```
-git config --global user.signingkey 845E6689
-```
-
-You may drop the `--global` option if you'd prefer to use this key for the 
current repository only.
-
-You may wish to start `gpg-agent` to unlock your GPG key only once using your 
passphrase. 
-Otherwise, you may need to enter this passphrase several times. The setup of 
`gpg-agent` varies 
-based on operating system, but may be something like this:
-
-```
-eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info)
-export GPG_TTY=$(tty)
-export GPG_AGENT_INFO
-```
-
-#### Access to Apache Nexus
-
-Configure access to the [Apache Nexus 
repository](https://repository.apache.org), used for 
-staging repository and promote the artifacts to Maven Central.
-
-1. You log in with your Apache account.
-2. Confirm you have appropriate access by finding `org.apache.carbondata` 
under `Staging Profiles`.
-3. Navigate to your `Profile` (top right dropdown menu of the page).
-4. Choose `User Token` from the dropdown, then click `Access User Token`. Copy 
a snippet of the 
-Maven XML configuration block.
-5. Insert this snippet twice into your global Maven `settings.xml` file, 
typically `${HOME]/
-.m2/settings.xml`. The end result should look like this, where `TOKEN_NAME` 
and `TOKEN_PASSWORD` 
-are your secret tokens:
-
-```
- <settings>
-   <servers>
-     <server>
-       <id>apache.releases.https</id>
-       <username>TOKEN_NAME</username>
-       <password>TOKEN_PASSWORD</password>
-     </server>
-     <server>
-       <id>apache.snapshots.https</id>
-       <username>TOKEN_NAME</username>
-       <password>TOKEN_PASSWORD</password>
-     </server>
-   </servers>
- </settings>
-```
-
-#### Create a new version in Jira
-
-When contributors resolve an issue in Jira, they are tagging it with a release 
that will contain 
-their changes. With the release currently underway, new issues should be 
resolved against a 
-subsequent future release. Therefore, you should create a release item for 
this subsequent 
-release, as follows:
-
-1. In Jira, navigate to `CarbonData > Administration > Versions`.
-2. Add a new release: choose the next minor version number compared to the one 
currently 
-underway, select today's date as the `Start Date`, and choose `Add`. 
-
-#### Triage release-blocking issues in Jira
-
-There could be outstanding release-blocking issues, which should be triaged 
before proceeding to 
-build the release. We track them by assigning a specific `Fix Version` field 
even before the 
-issue is resolved.
-
-The list of release-blocking issues is available at the [version status 
page](https://issues.apache.org/jira/browse/CARBONDATA/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel).
 
-Triage each unresolved issue with one of the following resolutions:
-
-* If the issue has been resolved and Jira was not updated, resolve it 
accordingly.
-* If the issue has not been resolved and it is acceptable to defer until the 
next release, update
- the `Fix Version` field to the new version you just created. Please consider 
discussing this 
- with stakeholders and the dev@ mailing list, as appropriate.
-* If the issue has not been resolved and it is not acceptable to release until 
it is fixed, the 
- release cannot proceed. Instead, work with the CarbonData community to 
resolve the issue.
- 
-#### Review Release Notes in Jira
-
-Jira automatically generates Release Notes based on the `Fix Version` applied 
to the issues. 
-Release Notes are intended for CarbonData users (not CarbonData 
committers/contributors). You 
-should ensure that Release Notes are informative and useful.
-
-Open the release notes from the [version status 
page](https://issues.apache.org/jira/browse/CARBONDATA/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel)
-by choosing the release underway and clicking Release Notes.
-
-You should verify that the issues listed automatically by Jira are appropriate 
to appear in the 
-Release Notes. Specifically, issues should:
-
-* Be appropriate classified as `Bug`, `New Feature`, `Improvement`, etc.
-* Represent noteworthy user-facing changes, such as new functionality, 
backward-incompatible 
-changes, or performance improvements.
-* Have occurred since the previous release; an issue that was introduced and 
fixed between 
-releases should not appear in the Release Notes.
-* Have an issue title that makes sense when read on its own.
-
-Adjust any of the above properties to the improve clarity and presentation of 
the Release Notes.
-
-#### Verify that a Release Build works
-
-Run `mvn clean install -Prelease` to ensure that the build processes that are 
specific to that 
-profile are in good shape.
-
-_Checklist to proceed to the next step:_
-
-1. Release Manager's GPG key is published to `dist.apache.org`.
-2. Release Manager's GPG key is configured in `git` configuration.
-3. Release Manager has `org.apache.carbondata` listed under `Staging Profiles` 
in Nexus.
-4. Release Manager's Nexus User Token is configured in `settings.xml`.
-5. Jira release item for the subsequent release has been created.
-6. There are no release blocking Jira issues.
-7. Release Notes in Jira have been audited and adjusted.
-
-### Build a release
-
-Use Maven release plugin to tag and build release artifacts, as follows:
-
-```
-mvn release:prepare
-```
-
-Use Maven release plugin to stage these artifacts on the Apache Nexus 
repository, as follows:
-
-```
-mvn release:perform
-```
-
-Review all staged artifacts. They should contain all relevant parts for each 
module, including 
-`pom.xml`, jar, test jar, source, etc. Artifact names should follow 
-[the existing 
format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.carbondata%22)
-in which artifact name mirrors directory structure. Carefully review any new 
artifacts.
-
-Close the staging repository on Nexus. When prompted for a description, enter 
"Apache CarbonData 
-x.x.x release".
-
-### Stage source release on dist.apache.org
-
-Copy the source release to dev repository on `dist.apache.org`.
-
-1. If you have not already, check out the Incubator section of the `dev` 
repository on `dist
-.apache.org` via Subversion. In a fresh directory:
-
-```
-svn co https://dist.apache.org/repos/dist/dev/incubator/carbondata
-```
-
-2. Make a directory for the new release:
-
-```
-mkdir x.x.x
-```
-
-3. Copy the CarbonData source distribution, hash, and GPG signature:
-
-```
-cp apache-carbondata-x.x.x-source-release.zip x.x.x
-```
-
-4. Add and commit the files:
-
-```
-svn add x.x.x
-svn commit
-```
-
-5. Verify the files are 
[present](https://dist.apache.org/repos/dist/dev/incubator/carbondata).
-
-###Â Propose a pull request for website updates
-
-The final step of building a release candidate is to propose a website pull 
request.
-
-This pull request should update the following page with the new release:
-
-* `src/main/webapp/index.html`
-* `src/main/webapp/docs/latest/mainpage.html`
-
-_Checklist to proceed to the next step:_
-
-1. Maven artifacts deployed to the staging repository of 
-[repository.apache.org](https://repository.apache.org)
-2. Source distribution deployed to the dev repository of
-[dist.apache.org](https://dist.apache.org/repos/dist/dev/incubator/carbondata/)
-3. Website pull request to list the release.
-
-## Vote on the release candidate
-
-Once you have built and individually reviewed the release candidate, please 
share it for the 
-community-wide review. Please review foundation-wide [voting 
guidelines](http://www.apache.org/foundation/voting.html)
-for more information.
-
-Start the review-and-vote thread on the dev@ mailing list. Here's an email 
template; please 
-adjust as you see fit:
-
-```
-From: Release Manager
-To: [email protected]
-Subject: [VOTE] Apache CarbonData Release x.x.x
-
-Hi everyone,
-Please review and vote on the release candidate for the version x.x.x, as 
follows:
-
-[ ] +1, Approve the release
-[ ] -1, Do not approve the release (please provide specific comments)
-
-The complete staging area is available for your review, which includes:
-* JIRA release notes [1],
-* the official Apache source release to be deployed to dist.apache.org [2], 
which is signed with the key with fingerprint FFFFFFFF [3],
-* all artifacts to be deployed to the Maven Central Repository [4],
-* source code tag "x.x.x" [5],
-* website pull request listing the release [6].
-
-The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.
-
-Thanks,
-Release Manager
-
-[1] link
-[2] link
-[3] https://dist.apache.org/repos/dist/dist/incubator/carbondata/KEYS
-[4] link
-[5] link
-[6] link
-```
-
-If there are any issues found in the release candidate, reply on the vote 
thread to cancel the vote.
-Thereâs no need to wait 72 hours. Proceed to the `Cancel a Release (Fix 
Issues)` step below and 
-address the problem.
-However, some issues donât require cancellation.
-For example, if an issue is found in the website pull request, just correct it 
on the spot and the
-vote can continue as-is.
-
-If there are no issues, reply on the vote thread to close the voting. Then, 
tally the votes in a
-separate email. Hereâs an email template; please adjust as you see fit.
-
-```
-From: Release Manager
-To: [email protected]
-Subject: [RESULT][VOTE] Apache CarbonData Release x.x.x
-
-I'm happy to announce that we have unanimously approved this release.
-
-There are XXX approving votes, XXX of which are binding:
-* approver 1
-* approver 2
-* approver 3
-* approver 4
-
-There are no disapproving votes.
-
-Thanks everyone!
-```
-
-While in incubation, the Apache Incubator PMC must also vote on each release, 
using the same 
-process as above. Start the review and vote thread on the 
`[email protected]` list.
-
-```
-From: Release Manager
-To: [email protected]
-Cc: [email protected]
-Subject: [VOTE] Apache CarbonData release x.x.x-incubating
-
-Hi everyone,
-Please review and vote on the release candidate for the Apache CarbonData 
version x.x.x-incubating,
- as follows:
- 
-[ ] +1, Approve the release
-[ ] -1, Do not approve the release (please provide specific comments)
-
-The complete staging area is available for your review, which includes:
-* JIRA release notes [1],
-* the official Apache source release to be deployed to dist.apache.org [2],
-* all artifacts to be deployed to the Maven Central Repository [3],
-* source code tag "x.x.x" [4],
-* website pull request listing the release [5].
-
-The Apache CarbonData community has unanimously approved this release [6].
-
-As customary, the vote will be open for at least 72 hours. It is adopted by
-a majority approval with at least three PMC affirmative votes. If approved,
-we will proceed with the release.
-
-Thanks!
-
-[1] link
-[2] link
-[3] link
-[4] link
-[5] link
-[6] lists.apache.org permalink to the vote result thread, e.g.,  
https://lists.apache.org/thread
-.html/32c991987e0abf2a09cd8afad472cf02e482af02ac35418ee8731940@%3Cdev.carbondata.apache.org%3E
-```
-
-If passed, close the voting and summarize the results:
- 
-```
-From: Release Manager
-To: [email protected]
-Cc: [email protected]
-Subject: [RESULT][VOTE] Apache CarbonData release x.x.x-incubating
-
-There are XXX approving votes, all of which are binding:
-* approver 1
-* approver 2
-* approver 3
-* approver 4
-
-There are no disapproving votes.
-
-We'll proceed with this release as staged.
-
-Thanks everyone!
-```
-
-_Checklist to proceed to the final step:_
-
-1. Community votes to release the proposed release
-2. While in incubation, Apache Incubator PMC votes to release the proposed 
release
-
-##Â Cancel a Release (Fix Issues)
-
-Any issue identified during the community review and vote should be fixed in 
this step.
-
-To fully cacel a vote:
-
-* Cancel the current release and verify the version is back to the correct 
SNAPSHOT:
-
-```
-mvn release:cancel
-```
-
-* Drop the release tag:
-
-```
-git tag -d x.x.x
-git push --delete apache x.x.x
-```
-
-* Drop the staging repository on Nexus 
([repository.apache.org](https://repository.apache.org))
-
-
-Verify the version is back to the correct SNAPSHOT.
-
-Code changes should be proposed as standard pull requests and merged.
-
-Once all issues have been resolved, you should go back and build a new release 
candidate with 
-these changes.
-
-##Â Finalize the release
-
-Once the release candidate has been reviewed and approved by the community, 
the release should be
- finalized. This involves the final deployment of the release to the release 
repositories, 
- merging the website changes, and announce the release.
- 
-###Â Deploy artifacts to Maven Central repository
-
-On Nexus, release the staged artifacts to Maven Central repository. In the 
`Staging Repositories`
- section, find the relevant release candidate `orgapachecarbondata-XXX` entry 
and click `Release`.
-
-###Â Deploy source release to dist.apache.org
-
-Copy the source release from the `dev` repository to `release` repository at 
`dist.apache.org` 
-using Subversion.
-
-###Â Merge website pull request
-
-Merge the website pull request to list the release created earlier.
-
-### Mark the version as released in Jira
-
-In Jira, inside [version 
management](https://issues.apache.org/jira/plugins/servlet/project-config/CARBONDATA/versions)
-, hover over the current release and a settings menu will appear. Click 
`Release`, and select 
-today's state.
-
-_Checklist to proceed to the next step:_
-
-1. Maven artifacts released and indexed in the
- [Maven Central 
repository](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.carbondata%22)
-2. Source distribution available in the release repository of
- 
[dist.apache.org](https://dist.apache.org/repos/dist/release/incubator/carbondata/)
-3. Website pull request to list the release merged
-4. Release version finalized in Jira
-
-##Â Promote the release
-
-Once the release has been finalized, the last step of the process is to 
promote the release 
-within the project and beyond.
-
-###Â Apache mailing lists
-
-Announce on the dev@ mailing list that the release has been finished.
- 
-Announce on the user@ mailing list that the release is available, listing 
major improvements and 
-contributions.
-
-While in incubation, announce the release on the Incubator's general@ mailing 
list.
-
-_Checklist to declare the process completed:_
-
-1. Release announced on the user@ mailing list.
-2. Release announced on the Incubator's general@ mailing list.
-3. Completion declared on the dev@ mailing list.

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/supported-data-types-in-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/supported-data-types-in-carbondata.md 
b/src/site/markdown/supported-data-types-in-carbondata.md
deleted file mode 100644
index 8f271e3..0000000
--- a/src/site/markdown/supported-data-types-in-carbondata.md
+++ /dev/null
@@ -1,41 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-#  Data Types
-
-#### CarbonData supports the following data types:
-
-  * Numeric Types
-    * SMALLINT
-    * INT/INTEGER
-    * BIGINT
-    * DOUBLE
-    * DECIMAL
-
-  * Date/Time Types
-    * TIMESTAMP
-    * DATE
-
-  * String Types
-    * STRING
-    * CHAR
-
-  * Complex Types
-    * arrays: ARRAY``<data_type>``
-    * structs: STRUCT``<col_name : data_type COMMENT col_comment, ...>``

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/troubleshooting.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/troubleshooting.md 
b/src/site/markdown/troubleshooting.md
deleted file mode 100644
index 9181d83..0000000
--- a/src/site/markdown/troubleshooting.md
+++ /dev/null
@@ -1,247 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-# Troubleshooting
-This tutorial is designed to provide troubleshooting for end users and 
developers
-who are building, deploying, and using CarbonData.
-
-## Failed to load thrift libraries
-
-  **Symptom**
-
-  Thrift throws following exception :
-
-  ```
-  thrift: error while loading shared libraries:
-  libthriftc.so.0: cannot open shared object file: No such file or directory
-  ```
-
-  **Possible Cause**
-
-  The complete path to the directory containing the libraries is not 
configured correctly.
-
-  **Procedure**
-
-  Follow the Apache thrift docs at 
[https://thrift.apache.org/docs/install](https://thrift.apache.org/docs/install)
 to install thrift correctly.
-
-## Failed to launch the Spark Shell
-
-  **Symptom**
-
-  The shell prompts the following error :
-
-  ```
-  org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis
-  $OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis
-  $OverrideCatalog$$overrides_$e
-  ```
-
-  **Possible Cause**
-
-  The Spark Version and the selected Spark Profile do not match.
-
-  **Procedure**
-
-  1. Ensure your spark version and selected profile for spark are correct.
-
-  2. Use the following command :
-
-    ```
-     "mvn -Pspark-2.1 -Dspark.version {yourSparkVersion} clean package"
-    ```
-
-    Note :  Refrain from using "mvn clean package" without specifying the 
profile.
-
-## Failed to execute load query on cluster.
-
-  **Symptom**
-
-  Load query failed with the following exception:
-
-  ```
-  Dictionary file is locked for updation.
-  ```
-
-  **Possible Cause**
-
-  The carbon.properties file is not identical in all the nodes of the cluster.
-
-  **Procedure**
-
-  Follow the steps to ensure the carbon.properties file is consistent across 
all the nodes:
-
-  1. Copy the carbon.properties file from the master node to all the other 
nodes in the cluster.
-     For example, you can use ssh to copy this file to all the nodes.
-
-  2. For the changes to take effect, restart the Spark cluster.
-
-## Failed to execute insert query on cluster.
-
-  **Symptom**
-
-  Load query failed with the following exception:
-
-  ```
-  Dictionary file is locked for updation.
-  ```
-
-  **Possible Cause**
-
-  The carbon.properties file is not identical in all the nodes of the cluster.
-
-  **Procedure**
-
-  Follow the steps to ensure the carbon.properties file is consistent across 
all the nodes:
-
-  1. Copy the carbon.properties file from the master node to all the other 
nodes in the cluster.
-       For example, you can use scp to copy this file to all the nodes.
-
-  2. For the changes to take effect, restart the Spark cluster.
-
-## Failed to connect to hiveuser with thrift
-
-  **Symptom**
-
-  We get the following exception :
-
-  ```
-  Cannot connect to hiveuser.
-  ```
-
-  **Possible Cause**
-
-  The external process does not have permission to access.
-
-  **Procedure**
-
-  Ensure that the Hiveuser in mysql must allow its access to the external 
processes.
-
-## Failure to read the metastore db during table creation.
-
-  **Symptom**
-
-  We get the following exception on trying to connect :
-
-  ```
-  Cannot read the metastore db
-  ```
-
-  **Possible Cause**
-
-  The metastore db is dysfunctional.
-
-  **Procedure**
-
-  Remove the metastore db from the carbon.metastore in the Spark Directory.
-
-## Failed to load data on the cluster
-
-  **Symptom**
-
-  Data loading fails with the following exception :
-
-   ```
-   Data Load failure exeception
-   ```
-
-  **Possible Cause**
-
-  The following issue can cause the failure :
-
-  1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not 
consistent across all nodes of the cluster.
-
-  2. Path to hdfs ddl is not configured correctly in the carbon.properties.
-
-  **Procedure**
-
-   Follow the steps to ensure the following configuration files are consistent 
across all the nodes:
-
-   1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files 
from the master node to all the other nodes in the cluster.
-      For example, you can use scp to copy this file to all the nodes.
-
-      Note : Set the path to hdfs ddl in carbon.properties in the master node.
-
-   2. For the changes to take effect, restart the Spark cluster.
-
-
-
-## Failed to insert data on the cluster
-
-  **Symptom**
-
-  Insertion fails with the following exception :
-
-   ```
-   Data Load failure exeception
-   ```
-
-  **Possible Cause**
-
-  The following issue can cause the failure :
-
-  1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not 
consistent across all nodes of the cluster.
-
-  2. Path to hdfs ddl is not configured correctly in the carbon.properties.
-
-  **Procedure**
-
-   Follow the steps to ensure the following configuration files are consistent 
across all the nodes:
-
-   1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files 
from the master node to all the other nodes in the cluster.
-      For example, you can use scp to copy this file to all the nodes.
-
-      Note : Set the path to hdfs ddl in carbon.properties in the master node.
-
-   2. For the changes to take effect, restart the Spark cluster.
-
-## Failed to execute Concurrent Operations(Load,Insert,Update) on table by 
multiple workers.
-
-  **Symptom**
-
-  Execution fails with the following exception :
-
-   ```
-   Table is locked for updation.
-   ```
-
-  **Possible Cause**
-
-  Concurrency not supported.
-
-  **Procedure**
-
-  Worker must wait for the query execution to complete and the table to 
release the lock for another query execution to succeed..
-
-## Failed to create a table with a single numeric column.
-
-  **Symptom**
-
-  Execution fails with the following exception :
-
-   ```
-   Table creation fails.
-   ```
-
-  **Possible Cause**
-
-  Behavior not supported.
-
-  **Procedure**
-
-  A single column that can be considered as dimension is mandatory for table 
creation.

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/useful-tips-on-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/useful-tips-on-carbondata.md 
b/src/site/markdown/useful-tips-on-carbondata.md
deleted file mode 100644
index b1ff903..0000000
--- a/src/site/markdown/useful-tips-on-carbondata.md
+++ /dev/null
@@ -1,180 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-# Useful Tips
-This tutorial guides you to create CarbonData Tables and optimize performance.
-The following sections will elaborate on the above topics :
-
-* [Suggestions to create CarbonData 
Table](#suggestions-to-create-carbondata-table)
-* [Configurations For Optimizing CarbonData 
Performance](#configurations-for-optimizing-carbondata-performance)
-
-## Suggestions to Create CarbonData Table
-
-Recently CarbonData was used to analyze performance of Telecommunication field.
-The results of the analysis for table creation with dimensions ranging from
-10 thousand to 10 billion rows and 100 to 300 columns have been summarized 
below.  
-
-The following table describes some of the columns from the table used.
- 
- 
-**Table Column Description**
-
-| Column Name | Data Type     | Cardinality | Attribution |
-|-------------|---------------|-------------|-------------|
-| msisdn      | String        | 30 million  | Dimension   |
-| BEGIN_TIME  | BigInt        | 10 Thousand | Dimension   |
-| HOST        | String        | 1 million   | Dimension   |
-| Dime_1      | String        | 1 Thousand  | Dimension   |
-| counter_1   | Numeric(20,0) | NA          | Measure     |
-| ...         | ...           | NA          | Measure     |
-| counter_100 | Numeric(20,0) | NA          | Measure     |
-
-CarbonData has more than 50 test cases, on the basis of these we have 
following suggestions to enhance the query performance :
-
-
-
-* **Put the frequently-used column filter in the beginning**
-
-  For example, MSISDN filter is used in most of the query then we must put the 
MSISDN in the first column. 
-The create table command can be modified as suggested below :
-
-```
-  create table carbondata_table(
-  msisdn String,
-  ...
-  )STORED BY 'org.apache.carbondata.format' 
-  TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,..',
-  'DICTIONARY_INCLUDE'='...');
-```
-  
-  Now the query with MSISDN in the filter will be more efficient.
-
-
-* **Put the frequently-used columns in the order of low to high cardinality**
-  
-  If the table in the specified query has multiple columns which are 
frequently used to filter the results, it is suggested to put
-  the columns in the order of cardinality low to high. This ordering of 
frequently used columns improves the compression ratio and 
-  enhances the performance of queries with filter on these columns.
-  
-  For example if MSISDN, HOST and Dime_1 are frequently-used columns, then the 
column order of table is suggested as 
-  Dime_1>HOST>MSISDN as Dime_1 has the lowest cardinality. 
-  The create table command can be modified as suggested below :
-
-```
-  create table carbondata_table(
-  Dime_1 String,
-  HOST String,
-  MSISDN String,
-  ...
-  )STORED BY 'org.apache.carbondata.format' 
-  TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST..',
-  'DICTIONARY_INCLUDE'='Dime_1..');
-```
-
-
-* **Put the Dimension type columns in order of low to high cardinality**
-
-  If the columns used to filter are not frequently used, then it is suggested 
to order all the columns of dimension type in order of low to high cardinality.
-The create table command can be modified as below :
-
-```
-  create table carbondata_table(
-  Dime_1 String,
-  BEGIN_TIME bigint
-  HOST String,
-  MSISDN String,
-  ...
-  )STORED BY 'org.apache.carbondata.format' 
-  TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..',
-  'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..');
-```
-
-
-* **For measure type columns with non high accuracy, replace Numeric(20,0) 
data type with Double data type**
-
-  For columns of measure type, not requiring high accuracy, it is suggested to 
replace Numeric data type with Double to enhance 
-query performance. The create table command can be modified as below :
-
-```
-  create table carbondata_table(
-  Dime_1 String,
-  BEGIN_TIME bigint
-  HOST String,
-  MSISDN String,
-  counter_1 double,
-  counter_2 double,
-  ...
-  counter_100 double
-  )STORED BY 'org.apache.carbondata.format' 
-  TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI',
-  'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME');
-```
-  The result of performance analysis of test-case shows reduction in query 
execution time from 15 to 3 seconds, thereby improving performance by nearly 5 
times.
-
- 
-* **Columns of incremental character should be re-arranged at the end of 
dimensions**
-
-  Consider the following scenario where data is loaded each day and the 
start_time is incremental for each load, it is
-suggested to put start_time at the end of dimensions. 
-
-  Incremental values are efficient in using min/max index. The create table 
command can be modified as below :
-
-```
-  create table carbondata_table(
-  Dime_1 String,
-  HOST String,
-  MSISDN String,
-  counter_1 double,
-  counter_2 double,
-  BEGIN_TIME bigint,
-  ...
-  counter_100 double
-  )STORED BY 'org.apache.carbondata.format' 
-  TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI',
-  'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); 
-```
-
-
-* **Avoid adding high cardinality columns to dictionary**
-
-  If the system has low memory configuration, then it is suggested to exclude 
high cardinality columns from the dictionary to 
-enhance load performance. Creation of  dictionary for high cardinality columns 
at time of load will degrade load performance due to 
-excessive memory usage. 
-
-  By default CarbonData determines the cardinality at the first data load and 
allows for dictionary creation only if the cardinality is less than
-1 million.
-
-
-## Configurations for Optimizing CarbonData Performance
-
-Recently we did some performance POC on CarbonData for Finance and 
telecommunication Field. It involved detailed queries and aggregation 
-scenarios. After the completion of POC, some of the configurations impacting 
the performance have been identified and tabulated below :
-
-| Parameter | Location | Used For  | Description | Tuning |
-|----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | 
Data loading | During the loading of data, local temp is used to sort the data. 
This number specifies the minimum number of intermediate files after which the  
merge sort has to be initiated. | Increasing the parameter to a higher value 
will improve the load performance. For example, when we increase the value from 
20 to 100, it increases the data load performance from 35MB/S to more than 
50MB/S. Higher values of this parameter consumes  more memory during the load. |
-| carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | 
Data loading | Specifies the number of cores used for data processing during 
data loading in CarbonData. | If you have more number of CPUs, then you can 
increase the number of CPUs, which will increase the performance. For example 
if we increase the value from 2 to 4 then the CSV reading performance can 
increase about 1 times |
-| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small file over a period of time impacting the query 
performance. Configuring this parameter will merge the small segment to one big 
segment which will sort the data and improve the performance. For Example in 
one telecommunication scenario, the performance improves about 2 times after 
minor compaction. |
-| spark.sql.shuffle.partitions | spark/con/spark-defaults.conf | Querying | 
The number of task started when spark shuffle. | The value can be 1 to 2 times 
as much as the executor cores. In an aggregation scenario, reducing the number 
from 200 to 32 reduced the query time from 17 to 9 seconds. |
-| num-executors/executor-cores/executor-memory | spark/con/spark-defaults.conf 
| Querying | The number of executors, CPU cores, and memory used for CarbonData 
query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each 
executor which can get good performance. This 2 value does not mean more the 
better. It needs to be configured properly in case of limited resources. For 
example, In the bank scenario, it has enough CPU 32 cores each node but less 
memory 64 GB each node. So we cannot give more CPU but less memory. For 
example, when 4 cores and 12GB for each executor. It sometimes happens GC 
during the query which impact the query performance very much from the 3 second 
to more than 15 seconds. In this scenario need to increase the memory or 
decrease the CPU cores. |
-| carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading 
| The buffer size to store records, returned from the block scan. | In limit 
scenario this parameter is very important. For example your query limit is 
1000. But if we set this value to 3000 that means we get 3000 records from scan 
but spark will only take 1000 rows. So the 2000 remaining are useless. In one 
Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
-| carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | 
Whether use YARN local directories for multi-table load disk load balance | If 
this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
-
-
- 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/user-guide-toc.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/user-guide-toc.md 
b/src/site/markdown/user-guide-toc.md
deleted file mode 100755
index 5771e10..0000000
--- a/src/site/markdown/user-guide-toc.md
+++ /dev/null
@@ -1,46 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-# User Guide
-Welcome to Apache CarbonData. Apache CarbonData(incubating) is a new big data 
file format for faster interactive query using advanced columnar storage, 
index, compression and encoding techniques to improve computing efficiency, 
which helps in speeding up queries by an order of magnitude faster over 
PetaBytes of data. 
-This user guide provides a detailed description about the CarbonData and its 
features.
-
-Let's get started !
-
-* [Overview](overview-of-carbondata.md)
-    * Introduction
-    * Features
-    * [Data Types](supported-data-types-in-carbondata.md)
-    * [CarbonData File Structure](file-structure-of-carbondata.md)
-* [Installation Guide](installation-guide.md)
-    * Installing and Configuring CarbonData on Standalone Spark Cluster
-    * Installing and Configuring CarbonData on "Spark on YARN Cluster
-* [Configuring CarbonData](configuration-parameters.md)
-    * System Configuration
-    * Performance Configuration
-    * Miscellaneous Configuration
-    * Spark Configuration
-* [Using CarbonData](using-carbondata.md)
-    * [Data Management](data-management.md)
-    * [DDL Operations on CarbonData](ddl-operation-on-carbondata.md )
-    * [DML Operations on CarbonData](dml-operation-on-carbondata.md )
-
-
-
-
-

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/using-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/using-carbondata.md 
b/src/site/markdown/using-carbondata.md
deleted file mode 100644
index 83a3655..0000000
--- a/src/site/markdown/using-carbondata.md
+++ /dev/null
@@ -1,35 +0,0 @@
-# Using CarbonData
-This tutorial discusses the disciplines related to management of data in 
Apache CarbonData.
-Following below each section is a brief introduction to respective disciplines 
related to data
-management.
-
-## Data Management
-This section shall be dealing with the disciplines related to managing data in 
the application,
-focusing on conceptual details related to operations like load data, delete 
data, update data
-and Compacting Data.
-
-For complete details refer to [Data Management](data-management.md)
-
-## Data Definition Language Support
-This section deals with the aspects related to creation and modification of 
the structure of database.
-It shall discuss in detail about
-
-*  Table creation
-*  Table deletion
-*  Table description
-*  Compaction
-
-For complete details refer to [DDL Operations on 
CarbonData](ddl-operation-on-carbondata.md )
-
-## Data Manipulation Language Support
-This section deals with the aspects related to data manipulation in database. 
It shall discuss in detail about selecting, loading and deleting in a database.
-This manipulation comprises of
-
-*  Loading data into database tables
-*  Retrieving existing data
-*  Deleting data from existing tables
-*  Deleting segments from existing tables
-*  Updating data in existing tables
-
-For complete details refer to [DML Operations on 
CarbonData](dml-operation-on-carbondata.md)
-

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/pdf.xml
----------------------------------------------------------------------
diff --git a/src/site/pdf.xml b/src/site/pdf.xml
deleted file mode 100644
index 710e7c7..0000000
--- a/src/site/pdf.xml
+++ /dev/null
@@ -1,38 +0,0 @@
-<document xmlns="http://maven.apache.org/DOCUMENT/1.0.1";
-          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
-          xsi:schemaLocation="http://maven.apache.org/DOCUMENT/1.0.1 
http://maven.apache.org/xsd/document-1.0.1.xsd";
-          outputName="maven-pdf-plugin">
-
-  <meta>
-    <title>CarbonData Documentation</title>
-    <author>The Apache CarbonData Community</author>
-  </meta>
-
-  <toc name="Table of Contents">
-    <item name="Quick Start" ref='quick-start-guide.md'/>
-    <item name="User Guide" ref='user-guide-toc.md'/>
-    <item name="Overview" ref='overview-of-carbondata.md'/>
-    <item name="CarbonData File Structure" 
ref='file-structure-of-carbondata.md'/>
-    <item name="Data Types" ref='supported-data-types-in-carbondata.md'/>
-    <item name="Installation" ref='installation-guide.md'/>
-    <item name="Configuring CarbonData" ref='configuration-parameters.md'/>
-    <item name="Using CarbonData" ref='using-carbondata.md'/>
-    <item name="Data Management" ref='data-management.md'/>
-    <item name="DDL" ref='ddl-operation-on-carbondata.md'/>
-    <item name="DML" ref='dml-operation-on-carbondata.md '/>
-    <item name="Useful Tips" ref='useful-tips-on-carbondata.md'/>
-    <item name="Troubleshooting" ref='troubleshooting.md'/>
-    <item name="FAQs" ref='faq.md'/>
-
-  </toc>
-
-  <cover>
-    <companyLogo>../../src/site/projectLogo/ApacheLogo.png</companyLogo>
-    <projectLogo>../../src/site/projectLogo/CarbonDataLogo.png</projectLogo>
-    <coverTitle>Apache CarbonData</coverTitle>
-    <coverSubTitle>Ver 1.0 </coverSubTitle>
-    <coverType>Documentation</coverType>
-    <projectName>Apache CarbonData</projectName>
-    <companyName>The Apache Software Foundation</companyName>
-  </cover>
-</document>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/projectLogo/ApacheLogo.png
----------------------------------------------------------------------
diff --git a/src/site/projectLogo/ApacheLogo.png 
b/src/site/projectLogo/ApacheLogo.png
deleted file mode 100644
index 9d25899..0000000
Binary files a/src/site/projectLogo/ApacheLogo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/projectLogo/CarbonDataLogo.png
----------------------------------------------------------------------
diff --git a/src/site/projectLogo/CarbonDataLogo.png 
b/src/site/projectLogo/CarbonDataLogo.png
deleted file mode 100644
index bc09b23..0000000
Binary files a/src/site/projectLogo/CarbonDataLogo.png and /dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/site.xml
----------------------------------------------------------------------
diff --git a/src/site/site.xml b/src/site/site.xml
deleted file mode 100644
index 997caa6..0000000
--- a/src/site/site.xml
+++ /dev/null
@@ -1,11 +0,0 @@
-<?xml version="1.0" encoding="ISO-8859-1"?>
-<project name="Apache CarbonData">
-<bannerLeft>
-
-</bannerLeft>
-<bannerRight>
-
-</bannerRight>
-<body>
-</body>
-</project>

[01/54] incubator-carbondata-site git commit: Wip for Automating Documentation for Website

Reply via email to