jorisvandenbossche commented on issue #39562: URL: https://github.com/apache/arrow/issues/39562#issuecomment-1893199802
So I could nail down the failure to the following: https://github.com/apache/arrow/blob/ac50918a63f13088a5ea9e6c27a9fadbbb19d53f/cpp/src/arrow/dataset/file_parquet.cc#L814-L823 In the above snippet, sometimes `manifest_->descr->num_columns()` returns -1, and so then we do a vector resize with -1 which triggers the `std::length_error` crash. (see https://github.com/apache/arrow/pull/39567 for the reproducer: I added a Status check for the value being positive, and now the tests sometimes fail with that error instead of crashing) But I have no idea why that would sometimes return -1, and only on MacOS when running the test from an installed wheel (not any of the other Mac builds where we build Arrow directly) cc @pitrou @mapleFU in case you have any clue about why `parquet::arrow::SchemaManifest::decr::num_columns()` would sometimes return -1. This methods returns the size of an underlying vector: https://github.com/apache/arrow/blob/ac50918a63f13088a5ea9e6c27a9fadbbb19d53f/cpp/src/parquet/schema.h#L437-L438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
