stevenpall opened a new pull request, #65141:
URL: https://github.com/apache/doris/pull/65141

   On branch-4.1, `FromBase64Impl::vector` and `ToBase64Impl::vector` size the 
output scratch buffer as `cipher_len = srclen / 2`. base64 decode writes up to 
`srclen*3/4` bytes and encode writes up to `4*ceil(srclen/3)`, both larger than 
`srclen/2`. When `srclen/2` falls at or below `MAX_STACK_CIPHER_LEN` (64 KiB) 
while the real output exceeds 64 KiB, the write overflows the 64 KiB 
`stack_buf` and corrupts the stack frame, causing a delayed SIGSEGV inside 
`StringOP::push_value_string` (the output `ColumnString` PODArray reference is 
clobbered). It reproduces on valid, correctly padded base64 (input length a 
multiple of 4), so the length guard in #64788 does not prevent it.
   
   master is not affected. It was refactored to pre-reserve the output column 
at the true size (`total_size += len/4*3` for decode, `4*((len+2)/3)` for 
encode) and write directly, which removed the `srclen/2` scratch path. That 
change is not in any released 4.1.x, and the auto cherry-pick of #64788 to 
branch-4.1 is marked `dev/4.1.x-conflict`. This PR is the minimal sizing fix 
for the 4.1 line.
   
   Fix (matches `FromBase64BinaryImpl`, which already sizes correctly):
   
   ```cpp
   // FromBase64Impl::vector
   auto cipher_len = srclen / 4 * 3;
   // ToBase64Impl::vector
   auto cipher_len = (srclen + 2) / 3 * 4;
   ```
   
   Verification: built stock be-4.1.2 with only this change and ran it in a 
production disaggregated cluster. A query decoding a 120000-character valid 
base64 value (90 KB output, inside the 64 KiB stack window) crashed the BE 
before and returns correctly after. Confirmed across all rows wider than 100 KB 
in a real table (19382 rows, max decode 612 KB) with no BE restarts.
   
   Repro on stock 4.1.2:
   
   ```sql
   SELECT length(from_base64(v)) FROM t;  -- v: a 120000-char base64 string of 
a repeated byte
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to