[GitHub] carbondata pull request #2056: [CARBONDATA-2238][DataLoad] Merge and spill i...

jackylk Thu, 15 Mar 2018 01:27:07 -0700

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2056#discussion_r174705630
  
    --- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
    @@ -141,13 +141,18 @@ public void initialize() throws MemoryException {
         semaphore = new Semaphore(parameters.getNumberOfCores());
       }
     
    -  private UnsafeCarbonRowPage createUnsafeRowPage() throws MemoryException 
{
    +  private UnsafeCarbonRowPage createUnsafeRowPage()
    +      throws MemoryException, CarbonSortKeyAndGroupByException {
         MemoryBlock baseBlock =
             UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, 
inMemoryChunkSize);
         boolean isMemoryAvailable =
             
UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size());
         if (isMemoryAvailable) {
           
UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size());
    +    } else {
    +      LOGGER.info("trigger in-memory merge and spill for table " + 
parameters.getTableName());
    --- End diff --
    
    I think this approach is good for case that the sort memory size is smaller 
than the input data size. For example, sort memory is 10GB and input data size 
is 25GB, then it is good. But in case sort memory is 10GB and input size is 
15GB, then the new approach will spill more data to disk, so current approach 
way is better.
    So why not let user to configure how much data to spill.

---

[GitHub] carbondata pull request #2056: [CARBONDATA-2238][DataLoad] Merge and spill i...

Reply via email to