Vishwanatha-HD commented on PR #48203: URL: https://github.com/apache/arrow/pull/48203#issuecomment-3570805783
> @Vishwanatha-HD Hey! Really appreciate you taking on the encoder/decoder paths for s390x — these two files are where a lot of the subtle BE issues first show up. > > One thing I ran into on real s390x hardware is that Arrow’s array buffers already store their scalars in canonical little-endian format. Because of that, per-value swapping inside the Plain/Arrow fast paths can sometimes lead to an unintended double-swap, especially when mixing Arrow-originated inputs with non-Arrow callers (e.g., DeltaBitPack or ByteStreamSplit feeding into the same decode path). > > A couple of spots I’m curious about here: > > **• PlainDecoder → Arrow fast path** The `#if ARROW_LITTLE_ENDIAN` branches check out for host-native buffers, but Arrow itself always hands you LE data. Does this path avoid re-swapping values that are already canonical LE from Arrow? I’ve seen that cause subtle mismatches on BE when DeltaBitPack or BSS push Arrow-arrays directly into the decoder. > > **• PlainEncoder (primitive path)** On BE, the per-value `ToLittleEndian` write works for correctness, though I’ve found cases where staging to a single LE scratch buffer helps avoid partial/mixed-endian outputs when builders and sinks run back-to-back. > > **• ByteStreamSplit** Here the code assumes DoSplitStreams handles endianness, but BSS usually expects inputs to already be in canonical LE order before the streams are interleaved. With native-order buffers on BE, the shuffle sometimes produces different stats/dictionary bytes across architectures. Curious if you’ve tested mixed Arrow/non-Arrow inputs through this path? > > None of this blocks the PR — just sharing things I hit in BE testing across the encode/decode → stats → page-index chain. @k8ika0s.. Thanks for your review comments.. While agree with you that from the structural point of view, things could be log better.. But please note that, I have tested with my changes on the s390x systems and also on Openshift AI workloads.. It works properly.. Hence there is no concern with these changes.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
