[
https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Venugopal Reddy K updated CARBONDATA-3519:
------------------------------------------
Description:
+*Issue-1:*+
{color:#0747a6}*Context:*{color}
For a string column with local dictionary enabled, a column page of
{{`{color:#de350b}UnsafeFixLengthColumnPage{color}` }}with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}` along with regular
`{color:#de350b}{{actualPage}}{color}` of
`{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`.
We have `{color:#de350b}*{{capacity}}*{color}` field in the
`{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field indicates
the capacity of allocated
`{color:#de350b}{{memoryBlock}}{color}` for the page.
`{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding rows
to check if `{color:#de350b}{{totalLength + requestSize > capacity}}{color}`
to allocate a new memoryBlock. If there is no room to add the next row,
allocates a new block, copy the old context(prev rows) and free the old
memoryBlock.
{color:#0747a6} *Problem:*{color}
While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with datatype
`{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{color:#de350b}{{encodedPage}}{color}`, we have not assigned the
*`{color:#de350b}{{capacity}}{color}`* field with allocated memory block size.
Hence, for each add row to tablePage, *ensureMemory() check always fails*,
allocates a new column page memoryBlock, copy the old context(prev rows) and
free the old memoryBlock. This *allocation of new memoryBlock and free of old
memoryBlock happens for each row row addition* for the string columns with
local dictionary.
+*Issue-2:*+
{color:#0747a6}*Context:*{color}
In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a
`{color:#de350b}rowOffset{color}` column page of
`{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype
`{color:#de350b}INT{color}`
to maintain the data offset to {color:#172b4d}each{color} row of variable
length columns. This `{color:#de350b}rowOffset{color}` page allocates to be
size of page.
{color:#0747a6} *Problem:*{color}
{color:#0747a6}{color:#172b4d}If we have 10 rows in the page, we need 11 rows
for its rowOffset page. Because we always keep 0 as offset to 1st row. So an
additional row is required for rowOffset page[pasted code below to show the
reference]. Otherwise, we ensureMemory() check always fails for the last
row(10th row in this case) of data and allocates a new crowOffset page
memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This
can happen for the string columns with local dictionary, direct dictionary
columns, global disctionary columns.{color}**{color}
{code:java}
public abstract class VarLengthColumnPageBase extends ColumnPage {
...
@Override
public void putBytes(int rowId, byte[] bytes) {
...
if (rowId == 0) {
rowOffset.putInt(0, 0);
}
rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
putBytesAtRow(rowId, bytes);
totalLength += bytes.length;
}
...
}
{code}
was:
*Context:*
For a string column with local dictionary enabled, a column page of
{{{color:#de350b}@UnsafeFixLengthColumnPage{color} }}with datatype
`{color:#ff8b00}{{DataTypes.BYTE_ARRAY}}{color}` is created for
`{{encodedPage}}` along with regular `{{actualPage}}` of
`{{UnsafeVarLengthColumnPage}}`.
We have `*{{capacity}}*` field in
the `{{UnsafeFixLengthColumnPage}}`. And this field indicates the capacity of
allocated
`{{memoryBlock}}` for the page. `{{ensureMemory()}}` method is being called
while adding rows to check if
`{{totalLength + requestSize > capacity}}` to allocate a new memoryBlock if
there is no room to add the next row, copy the old context(prev rows) and free
the old memoryBlock.
*Issues:*
# While, `{{UnsafeFixLengthColumnPage}}` with with datatype
`{{DataTypes.BYTE_ARRAY}}` is created for `{{encodedPage}}`, we have not
assigned the *`{{capacity}}`* field with allocated memory block size. Hence,
for each add row to tablePage, ensureMemory() check always fails, allocates a
new column page memoryBlock, copy the old context(prev rows) and free the old
memoryBlock. This allocation of new memoryBlock and free of old memoryBlock
happens at row addition for the string columns with local dictionary enabled.
# And in `VarLengthColumnPageBase`, we have a `rowOffset` column page of type
`UnsafeFixLengthColumnPage` to maintain the offset to each row of variable
length columns. This `rowOffset` page is
> A new column page MemoryBlock is allocated at each row addition to table page
> if having string column with local dictionary enabled.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-3519
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3519
> Project: CarbonData
> Issue Type: Improvement
> Components: core
> Reporter: Venugopal Reddy K
> Priority: Minor
>
> +*Issue-1:*+
> {color:#0747a6}*Context:*{color}
> For a string column with local dictionary enabled, a column page of
> {{`{color:#de350b}UnsafeFixLengthColumnPage{color}` }}with datatype
> `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
> `{color:#de350b}{{encodedPage}}{color}` along with regular
> `{color:#de350b}{{actualPage}}{color}` of
> `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`.
> We have `{color:#de350b}*{{capacity}}*{color}` field in the
> `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field
> indicates the capacity of allocated
> `{color:#de350b}{{memoryBlock}}{color}` for the page.
> `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding
> rows to check if `{color:#de350b}{{totalLength + requestSize >
> capacity}}{color}` to allocate a new memoryBlock. If there is no room to add
> the next row, allocates a new block, copy the old context(prev rows) and free
> the old memoryBlock.
> {color:#0747a6} *Problem:*{color}
> While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with
> datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for
> `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the
> *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block
> size. Hence, for each add row to tablePage, *ensureMemory() check always
> fails*, allocates a new column page memoryBlock, copy the old context(prev
> rows) and free the old memoryBlock. This *allocation of new memoryBlock and
> free of old memoryBlock happens for each row row addition* for the string
> columns with local dictionary.
>
> +*Issue-2:*+
> {color:#0747a6}*Context:*{color}
> In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a
> `{color:#de350b}rowOffset{color}` column page of
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype
> `{color:#de350b}INT{color}`
> to maintain the data offset to {color:#172b4d}each{color} row of variable
> length columns. This `{color:#de350b}rowOffset{color}` page allocates to be
> size of page.
> {color:#0747a6} *Problem:*{color}
> {color:#0747a6}{color:#172b4d}If we have 10 rows in the page, we need 11 rows
> for its rowOffset page. Because we always keep 0 as offset to 1st row. So an
> additional row is required for rowOffset page[pasted code below to show the
> reference]. Otherwise, we ensureMemory() check always fails for the last
> row(10th row in this case) of data and allocates a new crowOffset page
> memoryBlock, copy the old context(prev rows) and free the old memoryBlock.
> This can happen for the string columns with local dictionary, direct
> dictionary columns, global disctionary columns.{color}**{color}
>
> {code:java}
> public abstract class VarLengthColumnPageBase extends ColumnPage {
> ...
> @Override
> public void putBytes(int rowId, byte[] bytes) {
> ...
> if (rowId == 0) {
> rowOffset.putInt(0, 0);
> }
> rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
> putBytesAtRow(rowId, bytes);
> totalLength += bytes.length;
> }
> ...
> }
>
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)