[
https://issues.apache.org/jira/browse/IMPALA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946708#comment-17946708
]
ASF subversion and git services commented on IMPALA-13963:
----------------------------------------------------------
Commit bc0a92c5eddeefbe53093fe6777c4e3a472e026d in impala's branch
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bc0a92c5e ]
IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher
value
When setting the Iceberg table property 'write.parquet.page-size-bytes'
to a higher value, inserting into the table crashes Impala:
create table lineitem_iceberg_comment
stored as iceberg
tblproperties("write.parquet.page-size-bytes"="1048576")
as select l_comment from tpch_parquet.lineitem;
The impala executors crash because of memory corruption caused by buffer
overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before
attempting to write the next value, it checks whether the total byte
size would exceed 'plain_page_size_', but the buffer into which it
writes ('values_buffer_') has length 'values_buffer_len_'.
'values_buffer_len_' is initialised in the constructor to
'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of
'plain_page_size_'. However, it is intended to have at least the same
size, as can be seen from the check in ProcessValue() or the
GrowPageSize() method. The error does not usually surface because
'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'.
'values_buffer_' is also used for DICTIONARY encoding, but that takes
care of growing it as necessary.
This change fixes the problem by initialising 'values_buffer_len_' to
the value of 'plain_page_size_' in the constructor.
This leads to exposing another bug: in BitWriter::PutValue(), when we
check whether the next element fits in the buffer, we multiply
'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int.
This happens with values that we already use in our tests.
This change changes the type of 'max_bytes_' to int64_t, so multiplying
it by 8 (converting from bytes to bits) is now safe.
Testing:
- Added an EE test in iceberg-insert.test that reproduced the error.
Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76
Reviewed-on: http://gerrit.cloudera.org:8080/22777
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Noemi Pap-Takacs <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
> Crash when setting 'write.parquet.page-size-bytes' to a higher value
> --------------------------------------------------------------------
>
> Key: IMPALA-13963
> URL: https://issues.apache.org/jira/browse/IMPALA-13963
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
> Labels: impala, impala-iceberg
>
> When setting the Iceberg table property {{write.parquet.page-size-bytes}} to
> a higher value, inserting into the table crashes Impala.
> Repro:
> {code:java}
> create table lineitem_iceberg_comment stored as iceberg as select l_comment
> from tpch_parquet.lineitem union all select l_comment from
> tpch_parquet.lineitem;
> alter table lineitem_iceberg_comment set
> tblproperties("write.parquet.page-size-bytes"="6000000");
> insert into lineitem_iceberg_comment select l_comment from
> tpch_parquet.lineitem union all select l_comment from tpch_parquet.lineitem;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]