[ 
https://issues.apache.org/jira/browse/IMPALA-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946708#comment-17946708
 ] 

ASF subversion and git services commented on IMPALA-13963:
----------------------------------------------------------

Commit bc0a92c5eddeefbe53093fe6777c4e3a472e026d in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bc0a92c5e ]

IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher 
value

When setting the Iceberg table property 'write.parquet.page-size-bytes'
to a higher value, inserting into the table crashes Impala:

  create table lineitem_iceberg_comment
  stored as iceberg
  tblproperties("write.parquet.page-size-bytes"="1048576")
  as select l_comment from tpch_parquet.lineitem;

The impala executors crash because of memory corruption caused by buffer
overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before
attempting to write the next value, it checks whether the total byte
size would exceed 'plain_page_size_', but the buffer into which it
writes ('values_buffer_') has length 'values_buffer_len_'.
'values_buffer_len_' is initialised in the constructor to
'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of
'plain_page_size_'. However, it is intended to have at least the same
size, as can be seen from the check in ProcessValue() or the
GrowPageSize() method. The error does not usually surface because
'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'.

'values_buffer_' is also used for DICTIONARY encoding, but that takes
care of growing it as necessary.

This change fixes the problem by initialising 'values_buffer_len_' to
the value of 'plain_page_size_' in the constructor.

This leads to exposing another bug: in BitWriter::PutValue(), when we
check whether the next element fits in the buffer, we multiply
'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int.
This happens with values that we already use in our tests.

This change changes the type of 'max_bytes_' to int64_t, so multiplying
it by 8 (converting from bytes to bits) is now safe.

Testing:
  - Added an EE test in iceberg-insert.test that reproduced the error.

Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76
Reviewed-on: http://gerrit.cloudera.org:8080/22777
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Noemi Pap-Takacs <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>


> Crash when setting 'write.parquet.page-size-bytes' to a higher value
> --------------------------------------------------------------------
>
>                 Key: IMPALA-13963
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13963
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>              Labels: impala, impala-iceberg
>
> When setting the Iceberg table property {{write.parquet.page-size-bytes}} to 
> a higher value,  inserting into the table crashes Impala.
> Repro:
> {code:java}
> create table lineitem_iceberg_comment stored as iceberg as select l_comment 
> from tpch_parquet.lineitem union all select l_comment from 
> tpch_parquet.lineitem;
> alter table lineitem_iceberg_comment set 
> tblproperties("write.parquet.page-size-bytes"="6000000");
> insert into lineitem_iceberg_comment select l_comment from 
> tpch_parquet.lineitem union all select l_comment from tpch_parquet.lineitem;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to