jorisvandenbossche commented on issue #39562:
URL: https://github.com/apache/arrow/issues/39562#issuecomment-1893199802

   So I could nail down the failure to the following:
   
   
https://github.com/apache/arrow/blob/ac50918a63f13088a5ea9e6c27a9fadbbb19d53f/cpp/src/arrow/dataset/file_parquet.cc#L814-L823
   
   In the above snippet, sometimes `manifest_->descr->num_columns()` returns 
-1, and so then we do a vector resize with -1 which triggers the 
`std::length_error` crash. 
   (see https://github.com/apache/arrow/pull/39567 for the reproducer: I added 
a Status check for the value being positive, and now the tests sometimes fail 
with that error instead of crashing)
   
   But I have no idea why that would sometimes return -1, and only on MacOS 
when running the test from an installed wheel (not any of the other Mac builds 
where we build Arrow directly)
   
   cc @pitrou @mapleFU in case you have any clue about why 
`parquet::arrow::SchemaManifest::decr::num_columns()` would sometimes return 
-1. This methods returns the size of an underlying vector:
   
   
https://github.com/apache/arrow/blob/ac50918a63f13088a5ea9e6c27a9fadbbb19d53f/cpp/src/parquet/schema.h#L437-L438


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to