Hi,

What carbon provides are two level concepts:
1. File format, which can be used by compute engine to write and read data. 
CarbonData is a self-describing and type-aware columnar file format for Hadoop 
environement which is just as what orc, parquet provides.

2. Table level storage, which include not just file format but also aggregated 
index file (datamap), global dictionary, and segment metadata. It provides more 
functionality regarding segment management and SQL optimization (like lazy 
decode) through deep integration with compute engine (currently only spark deep 
integration is supported).

In my opinion, these two level abstraction are the core of carbondata project. 
But the database concept should be of compute engine which managing the store 
level metadata, since spark, hive, presto they all have these part in their 
layer.

I think what currently carbondata missing is that for table level storage, user 
should be enable to specify the table location to save the table data. This is 
to achieve: 
1. Compute engine can manage the carbon table location the same way as ORC and 
parquet table. And user uses same API or SQL syntax to create carbondata table, 
like `df.format(“carbondata”).save(“path”) ` using spark dataframe API. There 
should be no carbon storePath involved.

2. User should be able to save table in HDFS location or S3 location in the 
same context. Since there are several carbon property involved when determining 
the FS type, such as LOCK file, etc, it is not possible to create tables on 
HDFS and on S3 in same context, which also break the table level abstraction.

Regards,
Jacky

> 在 2017年10月3日,下午10:36,Mohammad Shahid Khan <mohdshahidkhan1...@gmail.com> 写道:
> 
> Hi Dev,
> Please find the design document for Support Database Location Configuration 
> while Creating Database.
> 
> Regards,
> Shahid
> <Support Database Location.docx>



Reply via email to