k8ika0s commented on PR #48217: URL: https://github.com/apache/arrow/pull/48217#issuecomment-3568372602
@Vishwanatha-HD Something I’ve seen on s390x is that ByteStreamSplit behaves most predictably when the data feeding into the split is already in a well-defined byte order before the interleaving happens. When values arrive in native order on BE, the shuffling pattern can produce different byte layouts than what downstream readers or stats logic expect on LE hosts. Looking at this patch, the swap + reversed-stream approach inside `DoSplitStreams` makes sense mechanically. I was wondering, though, how this interacts with callers that assume the inputs are already LE-normalized. In particular, mixed Arrow/non-Arrow inputs sometimes reveal subtle differences because Arrow arrays tend to carry scalars in canonical LE format even on BE machines. On the merge side, I’m also curious whether the current stream reversal covers the cases where BE decoding would otherwise lean on helpers that expect the shuffled bytes to correspond to LE-origin data. Not raising any correctness objections here — just sharing a few behaviors I’ve run into while testing BSS more broadly on BE systems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
