[
https://issues.apache.org/jira/browse/PARQUET-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553013#comment-15553013
]
Uwe L. Korn commented on PARQUET-739:
-------------------------------------
Can you post some example code to reproduce the problem?
> Read after free with uncompressed page
> --------------------------------------
>
> Key: PARQUET-739
> URL: https://issues.apache.org/jira/browse/PARQUET-739
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Florian Scheibner
> Assignee: Florian Scheibner
>
> Reading two parquet files in parallel lead to a memory corruption that caused
> a crash. The columns are rle dictionary encoded strings in an uncompressed
> page, created with parquet-mr. -fsanitize tracked the issue to a use-after
> free:
> {code}
> =================================================================
> ==81678==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x6060001088c0 at pc 0x000003dbd42b bp 0x7fffe30fbe00 sp 0x7fffe30fbdf8
> READ of size 16 at 0x6060001088c0 thread T8
> #0 0x3dbd42a in int
> parquet::RleDecoder::GetBatchWithDict<parquet::ByteArray>(parquet::Vector<parquet::ByteArray>
> const&, parquet::ByteArray*, int)
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3dbd42a)
> #1 0x3db8efa in
> parquet::DictionaryDecoder<parquet::DataType<(parquet::Type::type)6>
> >::Decode(parquet::ByteArray*, int)
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3db8efa)
> #2 0x3d84767 in
> parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)6>
> >::ReadValues(long, parquet::ByteArray*)
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3d84767)
> #3 0x3d83497 in
> parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)6>
> >::ReadBatch(int, short*, short*, parquet::ByteArray*, long*)
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3d83497)
> {code}
> Initial debugging showed that the indices for the dictionary returned by the
> rle decoder are garbage. So that data page got corrupted in memory. Reading
> the files in one thread works.
> I have a ColumnReader for each column and read one element from reach column
> to get a complete row.
> My guess is that some data buffer is freed and then later still used for
> reading. I couldn't track the source yet. Any ideas [~wesmckinn]?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)