Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @xuchuanyin There is a reason why we do copy instead of directly writing to 
HDFS.
    1. We make sure that one complete carbondata file goes to one HDFS block 
only, while copying it to HDFS from local disk we specify the block size. Other 
wise it impacts query performance a lot.
    2. Remove the overhead of writing to HDFS directly (it internally writes to 
replication as well) , so start copying in a different thread to avoid blocking 
of main loading flow.


---

Reply via email to