[ 
https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280877#comment-16280877
 ] 

Jian Fang commented on PARQUET-1169:
------------------------------------

It looks like these two lines in {{record_reader.cc}} are problematic:
{code}
std::copy(def_data + levels_position_, def_data + levels_written_, def_data);
std::copy(rep_data + levels_position_, rep_data + levels_written_, rep_data);
{code}

I dumped those variables value before above lines:

{code}
--------- Record Reader Internal State ---------
num_buffered_values_ 14
num_decoded_values_ 1
max_def_level_ 1
max_rep_level_ 0
nullable_values_ 1
at_record_start_ 0
records_read_ 0
values_written_ 0
values_capacity_ 0
null_count_ 0
levels_written_ 14
levels_position_ 1
levels_capacity_ 32
def_data length 4 // sizeof(def_levels()) / sizeof(int16_t)
rep_data length 4 // sizeof(rep_levels()) / sizeof(int16_t)
{code}

It seems like {{def_data}} and {{rep_data}} length is less than 
{{levels_written_ - levels_position_}}, does anyone know whether this is 
expected?

> Segment fault when using NextBatch of parquet::arrow::ColumnReader in 
> parquet-cpp
> ---------------------------------------------------------------------------------
>
>                 Key: PARQUET-1169
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1169
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Jian Fang
>         Attachments: test.parquet
>
>
> When I running the below code, I consistently get segment fault, not sure 
> whether this is a bug or I did something wrong. Anyone here could help me 
> take a look?
> {code:c++}
> #include <iostream>
> #include <string>
> #include "arrow/array.h"
> #include "arrow/io/file.h"
> #include "arrow/test-util.h"
> #include "parquet/arrow/reader.h"
> using arrow::Array;
> using arrow::default_memory_pool;
> using arrow::io::FileMode;
> using arrow::io::MemoryMappedFile;
> using parquet::arrow::ColumnReader;
> using parquet::arrow::FileReader;
> using parquet::arrow::OpenFile;
> int main(int argc, char** argv) {
>   if (argc > 1) {
>     std::string file_name = argv[1];
>     std::shared_ptr<MemoryMappedFile> file;
>     ABORT_NOT_OK(MemoryMappedFile::Open(file_name, FileMode::READ, &file));
>     std::unique_ptr<FileReader> file_reader;
>     ABORT_NOT_OK(OpenFile(file, default_memory_pool(), &file_reader));
>     std::unique_ptr<ColumnReader> column_reader;
>     ABORT_NOT_OK(file_reader->GetColumn(0, &column_reader));
>     std::shared_ptr<Array> array1;
>     ABORT_NOT_OK(column_reader->NextBatch(1, &array1));
>     std::cout << "length " << array1->length() << std::endl;
>     std::shared_ptr<Array> array2;
>     // segment fault
>     ABORT_NOT_OK(column_reader->NextBatch(1, &array2));
>     std::cout << "length " << array2->length() << std::endl;
>   }
>   return 0;
> }
> {code}
> Command to compile this program:
> {code}
> g++ test.c -I/usr/local/include/arrow -I/usr/local/include/parquet 
> --std=c++11 -lparquet -larrow -lgtest -o parquet_test
> {code}
> Command to run the program
> {code}
> ./parquet_test test.parquet
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to