Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1825
@ravipesala I reconsidered the questions you mentioned and fixed it as
below:
1. I use the user specified `table_blocksize` as the block size of data
files. Actually in the current implementation, the block size is big enough to
hold the entire file.
2. I directly write the data files to HDFS by specifying only 1 replication
in the main thread and complete the remaining replications in another thread --
just the same way as before.
After I implement this, I tested it in a 3-node cluster and the data
loading performance was just the same as before while the end-2-end `total
size of disk write decreased by about 11%`.
---