Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2056#discussion_r177662316
  
    --- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
 ---
    @@ -141,13 +141,18 @@ public void initialize() throws MemoryException {
         semaphore = new Semaphore(parameters.getNumberOfCores());
       }
     
    -  private UnsafeCarbonRowPage createUnsafeRowPage() throws MemoryException 
{
    +  private UnsafeCarbonRowPage createUnsafeRowPage()
    +      throws MemoryException, CarbonSortKeyAndGroupByException {
         MemoryBlock baseBlock =
             UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId, 
inMemoryChunkSize);
         boolean isMemoryAvailable =
             
UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size());
         if (isMemoryAvailable) {
           
UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size());
    +    } else {
    +      LOGGER.info("trigger in-memory merge and spill for table " + 
parameters.getTableName());
    --- End diff --
    
    Oh, I think the configuration will be difficult to understand. If the user 
configured 10GB, the spilled data files may be less since the file is 
compressed.
    
    Besides, we have to verify this configuration to make sure that it is less 
than sort memory.
    
    Most importantly, the sort memory is shared by all data loading. A 
configuration of this may not tell how it will affect the real behavior.
    
    I prefer to keep it simple now and the result is quite obvious: In the 
worst scenario, we only have to spill extra #sizeOfSortMemory data to disk 
comparing with the previous implementation. And in general scenario, we can 
spill bigger files to disk instead of more smaller files.


---

Reply via email to