Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @ravipesala I reconsidered the questions you mentioned and fixed it as 
below:
    
    1. I use the user specified `table_blocksize` as the block size of data 
files. Actually in the current implementation, the block size is big enough to 
hold the entire file.
    
    2. I directly write the data files to HDFS by specifying only 1 replication 
in the main thread and complete the remaining replications in another thread -- 
just the same way as before.
    
    After I implement this, I tested it in a 3-node cluster and the data 
loading  performance was just the same as before while the end-2-end `total 
size of disk write decreased by about 11%`.


---

Reply via email to