hqx871 opened a new issue, #35616:
URL: https://github.com/apache/arrow/issues/35616

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   hi team! I use the 0.15.1 and found a problem when read parquet file, which 
contains array column.
   - The asas output
   parquet-low-level-example(49396,0x7ff848622680) malloc: nano zone abandoned 
due to inability to preallocate reserved vm space.
   /Users/bytedance/Downloads/test.parquet row num:1000000
   =================================================================
   ==49396==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x0001087d73f8 at pc 0x0001076ecb8d bp 0x7ff7b8b4d6b0 sp 0x7ff7b8b4d6a8
   WRITE of size 8 at 0x0001087d73f8 thread T0
       #0 0x1076ecb8c in int 
arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, 
long long*, int, int, unsigned char const*, long long) rle_encoding.h:488
       #1 0x1076e62c8 in 
parquet::DictDecoderImpl<parquet::PhysicalType<(parquet::Type::type)2> 
>::DecodeSpaced(long long*, int, int, unsigned char const*, long long) 
encoding.cc:1079
       #2 0x1075d9e6b in 
parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2>
 >::ReadValuesSpaced(long long, long long) column_reader.cc:1052
       #3 0x1075dc1a9 in 
parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2>
 >::ReadRecordData(long long) column_reader.cc:1096
       #4 0x1075d6a4c in 
parquet::internal::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)2>
 >::ReadRecords(long long) column_reader.cc:822
       #5 0x1073d1583 in parquet::arrow::LeafReader::NextBatch(long long, 
std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:414
       #6 0x1073d55bd in parquet::arrow::NestedListReader::NextBatch(long long, 
std::__1::shared_ptr<arrow::ChunkedArray>*) reader.cc:469
       #7 0x1073f5a82 in 
parquet::arrow::RowGroupRecordBatchReader::ReadNext(std::__1::shared_ptr<arrow::RecordBatch>*)
 reader.cc:320
       #8 0x1073b409a in printParquetFile(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&) 
reader-writer.cc:97
       #9 0x1073b5209 in main reader-writer.cc:111
       #10 0x7ff8049b230f  (<unknown module>)
   
   0x0001087d73f8 is located 40 bytes to the left of global variable 'guard 
variable for arrow::SparseTensor::dim_name(int) const::kEmpty' defined in 
'/Users/bytedance/Downloads/arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc'
 (0x1087d7420) of size 8
   0x0001087d73f8 is located 0 bytes to the right of global variable 'kEmpty' 
defined in 
'/Users/bytedance/Downloads/arrow-apache-arrow-0.15.1/cpp/src/arrow/sparse_tensor.cc:415:28'
 (0x1087d73e0) of size 24
   SUMMARY: AddressSanitizer: global-buffer-overflow rle_encoding.h:488 in int 
arrow::util::RleDecoder::GetBatchWithDictSpaced<long long>(long long const*, 
long long*, int, int, unsigned char const*, long long)
   Shadow bytes around the buggy address:
     0x1000210fae20: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
     0x1000210fae30: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 00 f9 f9 f9
     0x1000210fae40: 00 00 f9 f9 00 f9 f9 f9 01 f9 f9 f9 01 f9 f9 f9
     0x1000210fae50: 01 f9 f9 f9 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9
     0x1000210fae60: 01 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 00 00
   =>0x1000210fae70: 00 00 00 f9 f9 f9 f9 f9 00 00 00 00 00 00 00[f9]
     0x1000210fae80: f9 f9 f9 f9 00 f9 f9 f9 00 00 00 f9 f9 f9 f9 f9
     0x1000210fae90: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faea0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faeb0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
     0x1000210faec0: 00 f9 f9 f9 00 00 f9 f9 00 f9 f9 f9 00 00 f9 f9
   Shadow byte legend (one shadow byte represents 8 application bytes):
     Addressable:           00
     Partially addressable: 01 02 03 04 05 06 07 
     Heap left redzone:       fa
     Freed heap region:       fd
     Stack left redzone:      f1
     Stack mid redzone:       f2
     Stack right redzone:     f3
     Stack after return:      f5
     Stack use after scope:   f8
     Global redzone:          f9
     Global init order:       f6
     Poisoned by user:        f7
     Container overflow:      fc
     Array cookie:            ac
     Intra object redzone:    bb
     ASan internal:           fe
     Left alloca redzone:     ca
     Right alloca redzone:    cb
   ==49396==ABORTING
   
   Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to