Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2056#discussion_r177662316
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/UnsafeSortDataRows.java
---
@@ -141,13 +141,18 @@ public void initialize() throws MemoryException {
semaphore = new Semaphore(parameters.getNumberOfCores());
}
- private UnsafeCarbonRowPage createUnsafeRowPage() throws MemoryException
{
+ private UnsafeCarbonRowPage createUnsafeRowPage()
+ throws MemoryException, CarbonSortKeyAndGroupByException {
MemoryBlock baseBlock =
UnsafeMemoryManager.allocateMemoryWithRetry(this.taskId,
inMemoryChunkSize);
boolean isMemoryAvailable =
UnsafeSortMemoryManager.INSTANCE.isMemoryAvailable(baseBlock.size());
if (isMemoryAvailable) {
UnsafeSortMemoryManager.INSTANCE.allocateDummyMemory(baseBlock.size());
+ } else {
+ LOGGER.info("trigger in-memory merge and spill for table " +
parameters.getTableName());
--- End diff --
Oh, I think the configuration will be difficult to understand. If the user
configured 10GB, the spilled data files may be less since the file is
compressed.
Besides, we have to verify this configuration to make sure that it is less
than sort memory.
Most importantly, the sort memory is shared by all data loading. A
configuration of this may not tell how it will affect the real behavior.
I prefer to keep it simple now and the result is quite obvious: In the
worst scenario, we only have to spill extra #sizeOfSortMemory data to disk
comparing with the previous implementation. And in general scenario, we can
spill bigger files to disk instead of more smaller files.
---