[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

xuchuanyin Fri, 19 Jan 2018 01:57:47 -0800

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @ravipesala I reconsidered the questions you mentioned and fixed it as 
below:
    
    1. I use the user specified `table_blocksize` as the block size of data 
files. Actually in the current implementation, the block size is big enough to 
hold the entire file.
    
    2. I directly write the data files to HDFS by specifying only 1 replication 
in the main thread and complete the remaining replications in another thread -- 
just the same way as before.
    
    After I implement this, I tested it in a 3-node cluster and the data 
loading  performance was just the same as before while the end-2-end `total 
size of disk write decreased by about 11%`.

---

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

Reply via email to