[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

ravipesala Wed, 17 Jan 2018 20:47:27 -0800

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @xuchuanyin There is a reason why we do copy instead of directly writing to 
HDFS.
    1. We make sure that one complete carbondata file goes to one HDFS block 
only, while copying it to HDFS from local disk we specify the block size. Other 
wise it impacts query performance a lot.
    2. Remove the overhead of writing to HDFS directly (it internally writes to 
replication as well) , so start copying in a different thread to avoid blocking 
of main loading flow.

---

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

Reply via email to