dmitry-chirkov-dremio opened a new issue, #49438:
URL: https://github.com/apache/arrow/issues/49438

   ### Describe the enhancement requested
   
   ### Describe the enhancement requested
   
   The `lpad_utf8_int32_utf8` and `rpad_utf8_int32_utf8` functions have a 
memory safety issue and performance inefficiency.
   
   **Memory safety issue:**
   
   When the fill string is longer than the padding space needed, the initial 
memcpy writes more bytes than allocated, causing a buffer overflow.
   
   **Performance issues:**
   
   1. **Single-byte fill**: Iterates character-by-character even for 
single-byte fills like space padding, when a single `memset` call would suffice.
   
   2. **Multi-byte fill**: Copies the fill pattern character-by-character in 
O(n) iterations instead of using a doubling strategy with O(log n) memcpy calls.
   
   **Proposed fixes:**
   
   1. Use `std::min(fill_text_len, total_fill_bytes)` for the initial copy to 
prevent overflow
   2. Add single-byte fill fast path using `memset`
   3. Replace character-by-character loop with doubling strategy for multi-byte 
fills
   
   ### Component(s)
   
   C++, Gandiva
   
   ### Component(s)
   
   C++, Gandiva


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to