Rene Sugar created ARROW-1611:
---------------------------------

             Summary: Crash in BitmapReader when length is zero
                 Key: ARROW-1611
                 URL: https://issues.apache.org/jira/browse/ARROW-1611
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.7.1
         Environment: Mac OS X 10.11.6
            Reporter: Rene Sugar
             Fix For: 0.8.0


This was found when applying the fix for ARROW-1601 to parquet-cpp.

BitmapReader can be called when the length is zero resulting in EXC_BAD_ACCESS 
when trying to access the first byte of bitmap.

Call stack says BitmapWriter because I added a BitmapWriter class to fix the 
same pattern as the INIT_BITSET/READ_NEXT_BITSET code for writing bitmaps in 
DefinitionLevelsToBitmap (parquet-cpp/src/parquet/column_reader.h). The 
constructors are the same so the compiler merged them.


Process 17313 launched: './bin/FileConvert' (x86_64)
Input files are: 
../../parquet-data/State_Drug_Utilization_Data_2016.csv
Processing input file: ../../parquet-data/State_Drug_Utilization_Data_2016.csv
Process 17313 stopped
* thread #1: tid = 0x4be842, 0x0000000101840fe9 
libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908,
 bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99, queue = 
'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, 
address=0x106ba0000)
    frame #0: 0x0000000101840fe9 
libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908,
 bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99
   96     : bitmap_(bitmap), position_(0), length_(length) {
   97       byte_offset_ = start_offset / 8;
   98       bit_offset_ = start_offset % 8;
-> 99       current_byte_ = bitmap[byte_offset_];
   100    }
   101  
   102    void Set() { current_byte_ |= (1 << bit_offset_); }
(lldb) thread backtrace
* thread #1: tid = 0x4be842, 0x0000000101840fe9 
libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908,
 bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99, queue = 
'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, 
address=0x106ba0000)
  * frame #0: 0x0000000101840fe9 
libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908,
 bitmap="}", start_offset=1048576, length=0) + 89 at bit-util.h:99
    frame #1: 0x0000000101840ded 
libparquet.1.dylib`arrow::internal::BitmapWriter::BitmapWriter(this=0x00007fff5fbf2908,
 bitmap="}", start_offset=1048576, length=0) + 45 at bit-util.h:96
    frame #2: 0x0000000101964bf3 
libparquet.1.dylib`parquet::Encoder<parquet::DataType<(parquet::Type::type)4> 
>::PutSpaced(this=0x0000000109b08bb0, src=0x000000012b86b000, num_values=0, 
valid_bits="}", valid_bits_offset=1048576) + 1747 at encoding.h:62
    frame #3: 0x0000000101931913 
libparquet.1.dylib`parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)4>
 >::WriteValuesSpaced(this=0x0000000109b08cb8, num_values=0, valid_bits="}", 
valid_bits_offset=1048576, values=0x000000012b86b000) + 115 at 
column_writer.cc:612

To reproduce this problem:

1) Download the CSV file.
Source: https://catalog.data.gov/dataset?res_format=CSV
State Drug Utilization Data 2016
https://data.medicaid.gov/api/views/3v6v-qk5s/rows.csv?accessType=DOWNLOAD

2) Run FileConvert (see https://github.com/renesugar/FileConvert)
./bin/FileConvert -i ./State_Drug_Utilization_Data_2016.csv -o 
./State_Drug_Utilization_Data_2016.parquet

(FileConvert is built using the same process as MapD.)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to