[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

chenliang613 Tue, 31 Jul 2018 03:57:01 -0700

Github user chenliang613 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2576#discussion_r206481821
  
    --- Diff: docs/s3-guide.md ---
    @@ -0,0 +1,63 @@
    +<!--
    +    Licensed to the Apache Software Foundation (ASF) under one or more 
    +    contributor license agreements.  See the NOTICE file distributed with
    +    this work for additional information regarding copyright ownership. 
    +    The ASF licenses this file to you under the Apache License, Version 2.0
    +    (the "License"); you may not use this file except in compliance with 
    +    the License.  You may obtain a copy of the License at
    +
    +      http://www.apache.org/licenses/LICENSE-2.0
    +
    +    Unless required by applicable law or agreed to in writing, software 
    +    distributed under the License is distributed on an "AS IS" BASIS, 
    +    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
    +    See the License for the specific language governing permissions and 
    +    limitations under the License.
    +-->
    +
    +#S3 Guide (Alpha Feature 1.4.1)
    +Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
    +use this feature if you want to store data on amazon cloud. Since the data 
is stored on to cloud 
    +storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
    +Carbon can support any Object store that conforms to Amazon S3 API. 
    +
    +#Writing to Object Store
    +To store carbondata files on to Object Store location, you need to set 
`carbon
    +.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
    +.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
    +
    +If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
    +table. 
    +For example:
    +
    +```
    +CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
    +``` 
    +
    +For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
    +
    +#Authentication
    +You need to set authentication properties to store the carbondata files on 
to S3 location. For 
    +more details on authentication properties, refer 
    +[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
    +
    +Another way of setting the authentication parameters is as follows:
    +
    +```
    +         SparkSession
    +         .builder()
    +         .master(masterURL)
    +         .appName("S3Example")
    +         .config("spark.driver.host", "localhost")
    +         .config("spark.hadoop.fs.s3a.access.key", "1111")
    +         .config("spark.hadoop.fs.s3a.secret.key", "2222")
    +         .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
    +         .getOrCreateCarbonSession()
    +```
    +
    +#Recommendations
    +1. Object stores like S3 does not support file leasing mechanism(supported 
by HDFS) that is 
    +required to take locks which ensure consistency between concurrent 
operations therefore, it is 
    +recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
    + to a HDFS directory.
    +2. As Object stores are eventual consistent meaning that any put request 
can take some time to reflect when trying to list objects from that bucket 
therefore concurrent queries are not supported. 
    --- End diff --
    
    Changes to : Object Storage

---

[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

Reply via email to