[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

xuchuanyin Wed, 24 Oct 2018 19:13:48 -0700

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2850
  
    emm, but in your implementation, most of the work has to be done by the 
user (multi-thread handling). CarbonData itself only split the input data and 
return multiple readers. If this is the solution, why not just tell the user to 
generate multiple CarbonReaders by passing only part of the input dir each time 
they create the reader?
    
    Addition to my proposal, I think we can add a buffer for the records. When 
`CarbonReader.next` is called, we can retrieve the record from the buffer and 
fill the buffer asynchronously. When`CarbonReader.hasNext` is called, we can 
first detect this from the buffer, if it is empty, we will then detect this 
from the recordReader and fill the buffer asynchronously.

---

[GitHub] carbondata issue #2850: [WIP] Added concurrent reading through SDK

Reply via email to