zanmato1984 commented on issue #39951: URL: https://github.com/apache/arrow/issues/39951#issuecomment-1943219652
> Enabling ASAN makes this crash go away. Enabling TSAN results in some reports which I describe in #40068, #40069. Building in debug mode results in [this assertion](https://github.com/apache/arrow/blob/0dbbd43ca9133912d1809394727784560cc5e797/cpp/src/arrow/compute/util.cc#L38) firing. > > Lowering `arrow::dataset::ScanOptions::batch_size` to 16 also fixes the crash (and lowering to 1024 does not). Thanks for the experiments. Though I can only guess what was happening, I think we are making progress. First I think the errors reported by TSAN don't seem to be related to this crash. But the fired assertion does. It indicates that an arrow-managed stack-like temp buffer is overflowed and possibly causing subsequent unexpected behaviors. It also explains why lowering `batch_size` makes crash go away - less temp space is required for smaller batch data. Though I can't explain why ASAN makes the crash go away, except that it slows down the program significantly so the chance of crash is reduced. To verify if the fired assertion is the root cause, could you try something similar to #40007 and see if it resolves the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
