GitHub user mohammadshahidkhan reopened a pull request:

    https://github.com/apache/carbondata/pull/1418

    [WIP] Support db location

    **Problem:**
    Currently carbon stores the database and tables at the fixed configured 
location “carbon.storeLocation”.  So even though in case of federated 
cluster i.e. multiple independent Namespaces/ Namenodes only one Namespace/ 
Namenode can be used as storage for the database and tables.
    The requirement is to support when create database can specify the 
LOCATION; the location can be viewfs path, name service path.  
    Using database location attribute configuration different name nodes can be 
used to store different databases. Having different name nodes for different 
databases will scale the read/write throughput.
    **Solution:**
    Setting the LOCATION attribute for a new database is a way to work with 
sets of files in an HDFS directory structure outside the fixed carbon store 
location.
    In Create database command user can specify the location, the location 
where database and table under that database namespace could be stored.
    1.  The database at specified location could be created at defined location 
using create database command with location attribute.
    Syntax:
    CREATE DATABASE [IF NOT EXISTS] database_name[COMMENT 'database_comment']   
 [LOCATION hdfs_path];
    The HDFS path can be viewfs path, nameservice path.
    2.  The database name and database location will be retrieved from the 
HiveSessionCatalog. The tables will be retrieved by scanning the table folders.
    3.  The read/write operation on old databases/ new databases created 
without specifying the LOCATION attribute should be performed the fixed carbon 
store location.
    4.  The table path formation will be same as the hive. The table path 
should be formed from database location or fixed Carbon store location and 
table name as given below. 
    The will be three possible scenarios:
    I.  Table path for the databases defined with location attribute.
              tablePath = databaseLocation +”/” + tableName
    
    II. Table path for the databases defined without location attribute.
             tablePath = carbon.storeLocation + “/” + 
database_Name+”.db” +”/”  + tableName
    
    III.        New table path for the default database.
              tablePath = carbon.storeLocation +”/”  + tableName
    
    5.  The carbon should support hybrid Nameservice path i.e. different 
operation could be performed in different name service. For example in load 
command the raw CSV files could be read from the nameservice ns1 i.e.  
“hdfs://ns1/user/data.csv” and database location could point to viewfs path 
“viewfs://federation_cluster/dir1/carbonstore”.
    
    Example:
    CREATE DATABASE carbon LOCATION 
“viewfs://federation_cluster/dir1/carbonstore”;
    CREATE TABLE carbon.carbon_table (c1 string, c2 int) stored by 
‘carbondata’;
    LOAD DATA INPATH ‘hdfs://ns1/user/data.csv’ into table 
carbon.carbon_table;


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mohammadshahidkhan/incubator-carbondata 
support_db_location

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1418
    
----
commit efc5edb8f721bc22edcaa8af16179b5ed734d32b
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-10T05:20:11Z

    Support database location:
    The database name and database location will be retrieved from the 
HiveSessionCatalog. The tables will be retrieved by scanning the table folders.
    The read/write operation on old databases/ new databases created without 
specifying the LOCATION attribute should be performed the fixed carbon store 
location.

commit bdb11c9a57bb5b7adc77a7d73ff242ef438be7cc
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-10T06:00:43Z

    Support configurable path for the store modification info file 
(modifiedTime.mdt).

commit 8bf6ac4b37d40bde6c9223e2b06609e2c771c004
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-10T10:17:00Z

    Fixed drop database issue

commit 6a336153297c6edc38010f6d871511a85b916531
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-11T10:00:51Z

    Corrected lock file location

commit b60a874ca6ba60f71ec030c96b397d0d20562fbe
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-16T08:58:07Z

    Making carbon table path same as hive.

commit d0d15148c0c4cc3f9a7c53c725138bb64def3e61
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-18T15:31:13Z

    fixed failing test cases

commit 3a32d6ba03b1a813cf348db21c9e5190ba442c7f
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-19T10:04:04Z

    fixed failing test cases

commit 151f25264e3cc1ecd1a78b051b5b9dc408aa3d90
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-19T14:02:53Z

    Added database location test and fixed existing failing test cases

commit 92599d21f06085299aa3357d64c6cc5efd075de9
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2017-10-19T18:11:32Z

    Fixed alter rename table + Corrected test cases
    Fixed drop table issue

----


---

Reply via email to