[
https://issues.apache.org/jira/browse/PARQUET-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687065#comment-15687065
]
Mike Trinkala commented on PARQUET-780:
---------------------------------------
You are correct it is more than just the NULL. My definition level array
contained the wrong value (indicating there was an entry). However, when I
write a simple test to reproduce the error I get the expected exception "Less
than the number of expected rows written in the current column chunk". The
more complicated case produces this:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a07e08 in std::vector<int, std::allocator<int> >::push_back
(__x=<optimized out>, this=<optimized out>)
at /usr/include/c++/5/bits/stl_vector.h:923
923 _M_emplace_back_aux(__x);
(gdb) bt
#0 0x00007ffff6a07e08 in std::vector<int, std::allocator<int> >::push_back
(__x=<optimized out>, this=<optimized out>)
at /usr/include/c++/5/bits/stl_vector.h:923
#1 parquet::DictEncoder<parquet::DataType<(parquet::Type::type)6> >::Put
(v=..., this=0x647840)
at /work/parquet-cpp/src/parquet/encodings/dictionary-encoding.h:332
#2 parquet::DictEncoder<parquet::DataType<(parquet::Type::type)6> >::Put
(this=<optimized out>, values=<optimized out>,
num_values=<optimized out>) at
/work/parquet-cpp/src/parquet/encodings/dictionary-encoding.h:228
#3 0x00007ffff6a0cccc in
parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)6>
>::WriteMiniBatch (this=0x65a320,
num_values=1, def_levels=<optimized out>, rep_levels=0x6661a0, values=0x0)
at /work/parquet-cpp/src/parquet/column/writer.cc:339
> WriterBatch API does not properly handle NULL values for byte array types
> -------------------------------------------------------------------------
>
> Key: PARQUET-780
> URL: https://issues.apache.org/jira/browse/PARQUET-780
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Affects Versions: cpp-0.1
> Reporter: Mike Trinkala
>
> Passing a NULL 'values' parameter into WriteBatch for a *ByteArray type will
> cause a segfault in the dictionary encoder.
> Related: https://issues.apache.org/jira/browse/PARQUET-719
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)