Hi Shahid, Can the new DDL be similar to Import / Export Syntax eg., EXPORT TABLE tablename TO 'export_target_path' -- Export actual table & associated agg tables as zip file
IMPORT [TABLE tablename] FROM 'source_path' -- Import data from zip file to "carbon store path" & register the table as mentioned in your mail, tablename can be optional in this case. ==> If tablename is not mentioned or mentioned table does not exist, we can assume table does not exist & need to create it ==> If tablename is mentioned & table exist, then we can assume it as incremental data update or schema evolution. ==> We can validate existing files checksum against new files & overwrite/remove stale files ==> If schema update happened, then we can update the schema into the metastore same way as we are doing for add/drop column commands. I think all newer carbondata versions are backward compatible, any restrictions or thoughts on cross version import export ? --- Regards, Naresh P R On Thu, Nov 23, 2017 at 4:47 PM, Mohammad Shahid Khan < mohdshahidkhan1...@gmail.com> wrote: > Hi Dev, > > *Please find initial solution.* > > > *CarbonData table backup and recovery* > > *Background* > > Customer has created one CarbonData table which is already loaded very huge > data, and now they install another cluster which want to use the same data > as this table and don’t want load again, because load data cost long time, > so they want can directly backup this table data and recover it in another > cluster. After recovery the data in the CarbonData user can use it as a > normal CarbonData table. > > *Requirement Description* > > A CarbonData table’s data can support backup the data and recover the data > which no need load data again. > > To reuse the CarbonData table of another cluster a DDL should be provided > to create the CarbonData table from the existing carbon table schema. > > *Solution* > > Currently CarbonData has below three types of tables > > 1. Normal table > > 2. Pre Aggregate table > > CarbonData should provide a DDL command to create the table from existing > table data. > Below DDL command could be used to create the table from existing table > data. > > * REGISTER TABLES FROM $dbPath* > > > > i. The database path will be scanned to get all table schemas. > > ii. The schema will be read to get the database name, table name > and columns details. > > iii. The *table will be registered to the hive catalog with > below details* > > *CREATE TABLE $tbName USING carbondata OPTIONS (tableName > "$dbName.$tbName",* > > *dbName "$dbName",* > > *tablePath "$tablePath",* > > *path "$tablePath"** )* > > > *Precondition**:* > > i. Before executing this command the old table schema and data > should be copied into the new store location. > > ii. If the table is aggregate table then all the aggregate tables > should be copied to the new store location. > > > > *Validation:* > > > 1. If database does not exist then the registration will fail. > 2. The table will be registered only if same table name is not already > registered. > 3. If the table contains the aggregate tables then all the aggregate > tables should be registered to hive catalog and if any the aggregate > table does not exist then the table creation operation should fail. > > Regards, > > Shahid >