This is an automated email from the ASF dual-hosted git repository.
jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new eff2977 [CARBONDATA-3519]Made optimizations in write step to avoid
unnecessary memory blk allocation/free
eff2977 is described below
commit eff297769bc0513cdf4e7c3d2fd3f57c5ca60df1
Author: Venu Reddy <[email protected]>
AuthorDate: Fri Dec 20 18:27:34 2019 +0530
[CARBONDATA-3519]Made optimizations in write step to avoid unnecessary
memory blk allocation/free
Issue-1:
Context:
For a string column with local dictionary enabled, a column page of
UnsafeFixLengthColumnPage with datatype DataTypes.BYTE_ARRAY is created for
encodedPage along with regular actualPage of UnsafeVarLengthColumnPage. We have
capacity field in the UnsafeFixLengthColumnPage. And this field indicates the
capacity of allocated
memoryBlock for the page. ensureMemory() method gets called while adding
rows to check if totalLength + requestSize > capacity to allocate a new
memoryBlock. If there is no room to add the next row, allocates a new block,
copy the old context(prev rows) and free the old memoryBlock.
Problem:
While, UnsafeFixLengthColumnPage with with datatype DataTypes.BYTE_ARRAY is
created for encodedPage, we have not assigned the capacity field with allocated
memory block size. Hence, for each add row to tablePage, ensureMemory() check
always fails, allocates a new column page memoryBlock, copy the old
context(prev rows) and free the old memoryBlock.
This allocation of new memoryBlock and free of old memoryBlock happens for
each row row addition for the string columns with local dictionary.
Solution:
We can set the capacity to allocated size.
Issue-2:
Context:
InVarLengthColumnPageBase, we have a rowOffset column page of
UnsafeFixLengthColumnPage of datatype INT to maintain the data offset to each
row of variable length columns. This rowOffset page allocates to be size of
page.
Problem:
If we have 10 rows in the page, we need 11 rows for its rowOffset page.
Because we always keep 0 as offset to 1st row. So an additional row is required
for rowOffset page. Otherwise, ensureMemory() check always fails for the last
row(10th row in this case) of data and allocates a new rowOffset page
memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This
can happen for the string columns with local dictionary, direct dictionary
columns, global disctionary columns.
Solution:
We can allocate for 1 additional row of rowOffset page.
This closes #3524
---
.../carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java | 4 ++--
.../carbondata/core/datastore/page/VarLengthColumnPageBase.java | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git
a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
index 13c382f..4ef5e5d 100644
---
a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
+++
b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
@@ -92,8 +92,8 @@ public class UnsafeFixLengthColumnPage extends ColumnPage {
this.eachRowSize = eachRowSize;
totalLength = 0;
if (columnPageEncoderMeta.getStoreDataType() == DataTypes.BYTE_ARRAY) {
- memoryBlock =
- UnsafeMemoryManager.allocateMemoryWithRetry(taskId, (long) pageSize
* eachRowSize);
+ capacity = pageSize * eachRowSize;
+ memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId,
capacity);
baseAddress = memoryBlock.getBaseObject();
baseOffset = memoryBlock.getBaseOffset();
}
diff --git
a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
index 1b1f8f6..01f1d55 100644
---
a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
+++
b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
@@ -73,7 +73,7 @@ public abstract class VarLengthColumnPageBase extends
ColumnPage {
try {
rowOffset = ColumnPage.newPage(
new ColumnPageEncoderMeta(spec, DataTypes.INT,
columnPageEncoderMeta.getCompressorName()),
- pageSize);
+ pageSize + 1);
} catch (MemoryException e) {
throw new RuntimeException(e);
}