Karakatiza666 commented on PR #438: URL: https://github.com/apache/arrow-js/pull/438#issuecomment-4533140203
Fair point — my `List<Float64>` analogy conflated two different things, and your clarification helps. Compliance is about correctly interpreting offsets the wire format presents, not about the implementation realizing enough child storage to exercise the full offset range. Under that definition, List is conforming today (int32 offset values all fit in Number), and LargeList/LargeUtf8/LargeBinary aren't, because `bigIntToNumber` throws on offset values > 2^53. One thing I wasn't sure about in your reply: when you said "_that doesn't mean the List child length is constrained, it would mean the number of lists that the ListVector represented would be constrained_" I couldn't tell whether you were pointing at a mechanism in the codebase that sidesteps the single-ArrayBuffer limit on child storage — I didn't find one, so I focused on the wire-format parsing compliance angle, which is what a new commit I pushed addresses. The commit: `VectorLoader` now rebases offsets to 0 on load for `LargeList`, `LargeUtf8`, and `LargeBinary`. After rebasing, in-memory offsets are always bounded by the child buffer's element count (which fits in Number for anything the runtime can allocate), so downstream narrowing succeeds for any spec-conforming wire input — including sliced views with absolute, non-rebased offsets. Inputs whose offsets imply a child buffer larger than JS's `ArrayBuffer` cap now fail honestly at child-buffer allocation in `readData`, rather than later at offset narrowing — a cleaner failure mode that's a property of the JS runtime, not of the implementation. Please let me know if this commit addresses your primary concern. The remaining ceiling — JS's per-ArrayBuffer cap of ~2^32 bytes on a single contiguous child buffer — is allocation capacity, not interpretation, and it applies uniformly across List/LargeList/LargeUtf8/LargeBinary. Lifting it would mean a chunked-children redesign (child as Vector<U> rather than single `Data<U>`), which is a substantial design change. As I previously expressed, I think that's a separate, more ambitious effort and deserves its own issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
