Hi,

When writing dataframe to carbon table, If the dataframe compute is costly, it 
is better to materialize it by saving to temporary CSV files and then load into 
carbon table. If the dataframe compute is not costly, for example, dataframe is 
the scan result of a hive table, then user can set the tempCSV option to false, 
and carbon will load it directly.

Regards,
Jacky


> 在 2017年10月17日,下午11:17,徐传印 <[email protected]> 写道:
> 
> Hi, community:
> 
> 
> 
> 
> When I go through the DataFrame.write related code in Carbondata, I find 
> there is an option to control whether to save the dataframe's data to a 
> temporary directory as CSV on disk.
> 
> 
> 
> 
> My question is why we need this procedure which will consume more disk IO and 
> why the option(tempCSV) is true by default?
> 
> 
> 
> 
> Related code can be referred:
> 
> https://github.com/apache/carbondata/blob/master/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala#L45
> 
> 
> 
> 
> https://github.com/apache/carbondata/blob/master/integration/spark-common/src/main/scala/org/apache/carbondata/spark/CarbonOption.scala#L43



Reply via email to