[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...

kumarvishal09 Mon, 21 May 2018 05:28:56 -0700

Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2314
  
    @ndwangsen This pr will help during concurrent query performance as number 
of task will be less,  but it will impact data loading performance as it will 
not use all the nodes and data locality is not considered. 
    
    But adding a property is not the correct way. It's better to expose one 
data load strategy interface and add multiple implementation..like distribute 
the data loading across nodes or your pr scenario when user wants to distribute 
the data based on size. Please expose one data loading strategy interface and  
concrete implementation for each type (existing+ size based distribution) and 
user can configure or pass in load parameter which strategy they want to opt 
based on their use case. If user is not passing any strategy implementation, it 
should take default implementation.  By this way user can add some custom 
strategy based on their use case

---

[GitHub] carbondata issue #2314: [CARBONDATA-2309][DataLoad] Add strategy to generate...

Reply via email to