Re: DDL for CarbonData table backup and recovery (new feature)

Naresh P R Thu, 23 Nov 2017 06:07:45 -0800

Hi Shahid,

Can the new DDL be similar to Import / Export Syntax
eg.,
EXPORT TABLE tablename TO 'export_target_path' -- Export actual table &
associated agg tables as zip file



IMPORT [TABLE tablename] FROM 'source_path' -- Import data from zip
file to "carbon store path" & register the table as mentioned in your
mail, tablename can be optional in this case.

==> If tablename is not mentioned or mentioned table does not exist,
we can assume table does not exist & need to create it

==> If tablename is mentioned & table exist, then we can assume it as
incremental data update or schema evolution.

    ==> We can validate existing files checksum against new files &
overwrite/remove stale files

    ==> If schema update happened, then we can update the schema into
the metastore same way as we are doing for add/drop column commands.


I think all newer carbondata versions are backward compatible, any
restrictions or thoughts on cross version import export ?
---
Regards,
Naresh P R


On Thu, Nov 23, 2017 at 4:47 PM, Mohammad Shahid Khan <
mohdshahidkhan1...@gmail.com> wrote:

> Hi Dev,
>
> *Please find initial solution.*
>
>
> *CarbonData table backup and recovery*
>
> *Background*
>
> Customer has created one CarbonData table which is already loaded very huge
> data, and now they install another cluster which want to use the same data
> as this table and don’t want load again, because load data cost long time,
> so they want can directly backup this table data and recover it in another
> cluster. After recovery the data in the CarbonData user can use it as a
> normal CarbonData table.
>
> *Requirement Description*
>
> A CarbonData table’s data can support backup the data and recover the data
> which no need load data again.
>
> To reuse the CarbonData table of another cluster a DDL should be provided
> to create the CarbonData table from the existing carbon table schema.
>
> *Solution*
>
> Currently CarbonData has below three types of tables
>
> 1.   Normal table
>
> 2.   Pre Aggregate table
>
> CarbonData should provide a DDL command to create the table from existing
> table data.
> Below DDL command could be used to create the table from existing table
> data.
>
> *  REGISTER TABLES FROM $dbPath*
>
>
>
>            i.   The database path will be scanned to get all table schemas.
>
>            ii. The schema will be read to get the database name, table name
> and columns details.
>
>            iii.  The *table will be registered to the hive catalog with
> below details*
>
> *CREATE TABLE $tbName USING carbondata OPTIONS (tableName
> "$dbName.$tbName",*
>
> *dbName "$dbName",*
>
> *tablePath "$tablePath",*
>
> *path "$tablePath"** )*
>
>
> *Precondition**:*
>
> i.        Before executing this command the old table schema and data
> should be copied into the new store location.
>
> ii.      If the table is aggregate table then all the aggregate tables
> should be copied to the new store location.
>
>
>
> *Validation:*
>
>
>    1.    If database does not exist then the registration will fail.
>    2.   The table will be registered only if same table name is not already
>    registered.
>    3.   If the table contains the aggregate tables then all the aggregate
>    tables should be registered to hive catalog and if any the aggregate
>    table does not exist then the table creation operation should fail.
>
> Regards,
>
> Shahid
>

Re: DDL for CarbonData table backup and recovery (new feature)

Reply via email to