metsw24-max opened a new pull request, #50074:
URL: https://github.com/apache/arrow/pull/50074

   ### Rationale for this change
   
   The CSV block parser sizes its per-chunk value array from `num_cols`, the 
column count inferred from the first line of the input, times the rows-in-chunk 
count. `PresizedValueDescWriter` computes `2 + num_rows * num_cols`, and 
`ParseSpecialized` computes `num_cols_ * (num_rows_ - start) * 10`, both in 
`int32_t`. A CSV whose first line carries a few million fields pushes these 
products past `INT32_MAX`, which is signed-integer-overflow UB (UBSan flags 
both expressions).
   
   ### What changes are included in this PR?
   
   Widen both multiplications to `int64_t`, matching their `int64_t` 
destinations.
   
   ### Are these changes tested?
   
   Existing CSV parser tests pass. The overflow was confirmed with a standalone 
UBSan build of the two expressions, clean after widening.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to