[ 
https://issues.apache.org/jira/browse/PARQUET-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687065#comment-15687065
 ] 

Mike Trinkala commented on PARQUET-780:
---------------------------------------

You are correct it is more than just the NULL. My definition level array 
contained the wrong value (indicating there was an entry). However, when I 
write a simple test to reproduce the error I get the expected exception "Less 
than the number of expected rows written in the current column chunk".  The 
more complicated case produces this:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a07e08 in std::vector<int, std::allocator<int> >::push_back 
(__x=<optimized out>, this=<optimized out>)
    at /usr/include/c++/5/bits/stl_vector.h:923
923               _M_emplace_back_aux(__x);
(gdb) bt
#0  0x00007ffff6a07e08 in std::vector<int, std::allocator<int> >::push_back 
(__x=<optimized out>, this=<optimized out>)
    at /usr/include/c++/5/bits/stl_vector.h:923
#1  parquet::DictEncoder<parquet::DataType<(parquet::Type::type)6> >::Put 
(v=..., this=0x647840)
    at /work/parquet-cpp/src/parquet/encodings/dictionary-encoding.h:332
#2  parquet::DictEncoder<parquet::DataType<(parquet::Type::type)6> >::Put 
(this=<optimized out>, values=<optimized out>, 
    num_values=<optimized out>) at 
/work/parquet-cpp/src/parquet/encodings/dictionary-encoding.h:228
#3  0x00007ffff6a0cccc in 
parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6> 
>::WriteMiniBatch (this=0x65a320, 
    num_values=1, def_levels=<optimized out>, rep_levels=0x6661a0, values=0x0) 
at /work/parquet-cpp/src/parquet/column/writer.cc:339
 

> WriterBatch API does not properly handle NULL values for byte array types
> -------------------------------------------------------------------------
>
>                 Key: PARQUET-780
>                 URL: https://issues.apache.org/jira/browse/PARQUET-780
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>    Affects Versions: cpp-0.1
>            Reporter: Mike Trinkala
>
> Passing a NULL 'values' parameter into WriteBatch for a *ByteArray type will 
> cause a segfault in the dictionary encoder.
> Related: https://issues.apache.org/jira/browse/PARQUET-719



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to