[ 
https://issues.apache.org/jira/browse/PARQUET-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553013#comment-15553013
 ] 

Uwe L. Korn commented on PARQUET-739:
-------------------------------------

Can you post some example code to reproduce the problem?

> Read after free with uncompressed page
> --------------------------------------
>
>                 Key: PARQUET-739
>                 URL: https://issues.apache.org/jira/browse/PARQUET-739
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Florian Scheibner
>            Assignee: Florian Scheibner
>
> Reading two parquet files in parallel lead to a memory corruption that caused 
> a crash. The columns are rle dictionary encoded strings in an uncompressed 
> page, created with parquet-mr. -fsanitize tracked the issue to a use-after 
> free:
> {code}
> =================================================================
> ==81678==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x6060001088c0 at pc 0x000003dbd42b bp 0x7fffe30fbe00 sp 0x7fffe30fbdf8
> READ of size 16 at 0x6060001088c0 thread T8
>    #0 0x3dbd42a in int 
> parquet::RleDecoder::GetBatchWithDict<parquet::ByteArray>(parquet::Vector<parquet::ByteArray>
>  const&, parquet::ByteArray*, int) 
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3dbd42a)
>    #1 0x3db8efa in 
> parquet::DictionaryDecoder<parquet::DataType<(parquet::Type::type)6> 
> >::Decode(parquet::ByteArray*, int) 
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3db8efa)
>    #2 0x3d84767 in 
> parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)6> 
> >::ReadValues(long, parquet::ByteArray*) 
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3d84767)
>    #3 0x3d83497 in 
> parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)6> 
> >::ReadBatch(int, short*, short*, parquet::ByteArray*, long*) 
> (/home/fscheibner/Snowflake/ExecPlatform/bin/snowflake+0x3d83497)
> {code}
> Initial debugging showed that the indices for the dictionary returned by the 
> rle decoder are garbage. So that data page got corrupted in memory. Reading 
> the files in one thread works.
> I have a ColumnReader for each column and read one element from reach column 
> to get a complete row.
> My guess is that some data buffer is freed and then later still used for 
> reading. I couldn't track the source yet. Any ideas [~wesmckinn]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to