Re: [PR] GH-48216: [C++][Parquet] Fix Util Byte Stream Split Internal logic to enable Parquet DB support on s390x [arrow]

via GitHub Sun, 23 Nov 2025 13:55:00 -0800


k8ika0s commented on PR #48217:
URL: https://github.com/apache/arrow/pull/48217#issuecomment-3568372602


   @Vishwanatha-HD 
   
   Something I’ve seen on s390x is that ByteStreamSplit behaves most 
predictably when the data feeding into the split is already in a well-defined 
byte order before the interleaving happens. When values arrive in native order 
on BE, the shuffling pattern can produce different byte layouts than what 
downstream readers or stats logic expect on LE hosts.
   
   Looking at this patch, the swap + reversed-stream approach inside 
`DoSplitStreams` makes sense mechanically. I was wondering, though, how this 
interacts with callers that assume the inputs are already LE-normalized. In 
particular, mixed Arrow/non-Arrow inputs sometimes reveal subtle differences 
because Arrow arrays tend to carry scalars in canonical LE format even on BE 
machines.
   
   On the merge side, I’m also curious whether the current stream reversal 
covers the cases where BE decoding would otherwise lean on helpers that expect 
the shuffled bytes to correspond to LE-origin data.
   
   Not raising any correctness objections here — just sharing a few behaviors 
I’ve run into while testing BSS more broadly on BE systems. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48216: [C++][Parquet] Fix Util Byte Stream Split Internal logic to enable Parquet DB support on s390x [arrow]

Reply via email to