[
https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Venugopal Reddy K updated CARBONDATA-3519:
------------------------------------------
Description:
{code:java}
{code}
*Issue:1*
{color:#0747a6}*Context:*{color}
For a string column with local dictionary enabled, a column page of
`{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}` along with regular
`{color:#de350b}{{actualPage}}{color}` of
`{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`.
We have `{color:#de350b}*{{capacity}}*{color}` field in the
`{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates
the capacity of allocated
`{color:#de350b}{{memoryBlock}}{color}` for the page.
`{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows
to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}`
to allocate a new memoryBlock. If there is no room to add the next row,
allocates a new block, copy the old context(prev rows) and free the old
memoryBlock.
{color:#0747a6} *Problem:*{color}
While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}`, we have not assigned the
*`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size.
Hence, for each add row to tablePage, *ensureMemory() check always fails*,
allocates a new column page memoryBlock, copy the old context(prev rows) and
free the old memoryBlock. This *allocation of new memoryBlock and free of old
memoryBlock happens for each row row addition* for the string columns with
local dictionary.
+*Issue-2:*+
{color:#0747a6}*Context:*{color}
In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a
`{color:#de350b}rowOffset{color}` column page of
`{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype
`{color:#de350b}INT{color}`
to maintain the data offset to {color:#172b4d}each{color} row of variable
length columns. This `{color:#de350b}rowOffset{color}` page allocates to be
size of page.
{color:#0747a6} *Problem:*{color}
{color:#172b4d}If we have 10 rows in the page, we need 11 rows for its
rowOffset page. Because we always keep 0 as offset to 1st row. So an additional
row is required for rowOffset page[pasted code below to show the reference].
Otherwise, *ensureMemory() check always fails for the last row*(10th row in
this case) of data and *allocates a new rowOffset page memoryBlock, copy the
old context(prev rows) and free the old memoryBlock*. This *can happen for the
string columns with local dictionary, direct dictionary columns, global
disctionary columns*.{color}
{code:java}
public abstract class VarLengthColumnPageBase extends ColumnPage {
...
@Override
public void putBytes(int rowId, byte[] bytes) {
...
if (rowId == 0) {
rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
}
rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
putBytesAtRow(rowId, bytes);
totalLength += bytes.length;
}
...
}
{code}
was:
{code:java}
{code}
*// code placeholder**Issue-1:*
{color:#0747a6}*Context:*{color}
For a string column with local dictionary enabled, a column page of
`{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}` along with regular
`{color:#de350b}{{actualPage}}{color}` of
`{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`.
We have `{color:#de350b}*{{capacity}}*{color}` field in the
`{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates
the capacity of allocated
`{color:#de350b}{{memoryBlock}}{color}` for the page.
`{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows
to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}`
to allocate a new memoryBlock. If there is no room to add the next row,
allocates a new block, copy the old context(prev rows) and free the old
memoryBlock.
{color:#0747a6} *Problem:*{color}
While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}`, we have not assigned the
*`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size.
Hence, for each add row to tablePage, *ensureMemory() check always fails*,
allocates a new column page memoryBlock, copy the old context(prev rows) and
free the old memoryBlock. This *allocation of new memoryBlock and free of old
memoryBlock happens for each row row addition* for the string columns with
local dictionary.
+*Issue-2:*+
{color:#0747a6}*Context:*{color}
In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a
`{color:#de350b}rowOffset{color}` column page of
`{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype
`{color:#de350b}INT{color}`
to maintain the data offset to {color:#172b4d}each{color} row of variable
length columns. This `{color:#de350b}rowOffset{color}` page allocates to be
size of page.
{color:#0747a6} *Problem:*{color}
{color:#172b4d}If we have 10 rows in the page, we need 11 rows for its
rowOffset page. Because we always keep 0 as offset to 1st row. So an additional
row is required for rowOffset page[pasted code below to show the reference].
Otherwise, *ensureMemory() check always fails for the last row*(10th row in
this case) of data and *allocates a new rowOffset page memoryBlock, copy the
old context(prev rows) and free the old memoryBlock*. This *can happen for the
string columns with local dictionary, direct dictionary columns, global
disctionary columns*.{color}
{code:java}
public abstract class VarLengthColumnPageBase extends ColumnPage {
...
@Override
public void putBytes(int rowId, byte[] bytes) {
...
if (rowId == 0) {
rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
}
rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
putBytesAtRow(rowId, bytes);
totalLength += bytes.length;
}
...
}
{code}
> A new column page MemoryBlock is allocated at each row addition to table page
> if having string column with local dictionary enabled.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-3519
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3519
> Project: CarbonData
> Issue Type: Improvement
> Components: core
> Reporter: Venugopal Reddy K
> Priority: Minor
>
>
> {code:java}
> {code}
> *Issue:1*
> {color:#0747a6}*Context:*{color}
> For a string column with local dictionary enabled, a column page of
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype
> `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
> `{color:#de350b}{{encodedPage}}{color}` along with regular
> `{color:#de350b}{{actualPage}}{color}` of
> `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`.
> We have `{color:#de350b}*{{capacity}}*{color}` field in the
> `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field
> indicates the capacity of allocated
> `{color:#de350b}{{memoryBlock}}{color}` for the page.
> `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding
> rows to check if `{color:#de350b}{{totalLength + requestSize >
> capacity}}{color}` to allocate a new memoryBlock. If there is no room to add
> the next row, allocates a new block, copy the old context(prev rows) and free
> the old memoryBlock.
> {color:#0747a6} *Problem:*{color}
> While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with
> datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
> `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the
> *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block
> size. Hence, for each add row to tablePage, *ensureMemory() check always
> fails*, allocates a new column page memoryBlock, copy the old context(prev
> rows) and free the old memoryBlock. This *allocation of new memoryBlock and
> free of old memoryBlock happens for each row row addition* for the string
> columns with local dictionary.
>
> +*Issue-2:*+
> {color:#0747a6}*Context:*{color}
> In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a
> `{color:#de350b}rowOffset{color}` column page of
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype
> `{color:#de350b}INT{color}`
> to maintain the data offset to {color:#172b4d}each{color} row of variable
> length columns. This `{color:#de350b}rowOffset{color}` page allocates to be
> size of page.
> {color:#0747a6} *Problem:*{color}
> {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its
> rowOffset page. Because we always keep 0 as offset to 1st row. So an
> additional row is required for rowOffset page[pasted code below to show the
> reference]. Otherwise, *ensureMemory() check always fails for the last
> row*(10th row in this case) of data and *allocates a new rowOffset page
> memoryBlock, copy the old context(prev rows) and free the old memoryBlock*.
> This *can happen for the string columns with local dictionary, direct
> dictionary columns, global disctionary columns*.{color}
>
> {code:java}
> public abstract class VarLengthColumnPageBase extends ColumnPage {
> ...
> @Override
> public void putBytes(int rowId, byte[] bytes) {
> ...
> if (rowId == 0) {
> rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
> }
> rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
> putBytesAtRow(rowId, bytes);
> totalLength += bytes.length;
> }
> ...
> }
>
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)