wjones127 commented on pull request #12216: URL: https://github.com/apache/arrow/pull/12216#issuecomment-1022834715
@emkornfield I have debugged further and I believe I have narrowed down to the approximate place where the data is being corrupted, though it's very strange. I have added two `ValidateFull()` calls that seem to be before and after this corruption occurs. The one on `parquet/arrow/reader_internal.cc:780` passes, but the one on `parquet/arrow/reader.cc:482` fails. The error I get when I run: ``` 56: /Users/willjones/Documents/arrows/arrow/cpp/src/parquet/arrow/reader.cc:482: Check failed: _s.ok() Operation failed: out_->ValidateFull() 56: Bad status: Invalid: In chunk 0: Invalid: null_count value (854) doesn't match actual number of nulls in array (861) 56: /Users/willjones/Documents/arrows/arrow/cpp/src/arrow/array/validate.cc:118 ValidateNulls(*data.type) ``` Problem is between those two points I see nothing that could alter the array. I am running this with `OMP_NUM_THREADS=1` and `OMP_THREAD_LIMIT=1`, and I can confirm that in my debugger I only see 2 threads (1 worker and one worker loop). So I'm very confused as to what could be going on between there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
