Karakatiza666 commented on PR #438:
URL: https://github.com/apache/arrow-js/pull/438#issuecomment-4533140203

   Fair point — my `List<Float64>` analogy conflated two different things, and 
your clarification helps. Compliance is about correctly interpreting offsets 
the wire format presents, not about the implementation realizing enough child 
storage to exercise the full offset range. Under that definition, List is 
conforming today (int32 offset values all fit in Number), and 
LargeList/LargeUtf8/LargeBinary aren't, because `bigIntToNumber` throws on 
offset values > 2^53.
   
   One thing I wasn't sure about in your reply: when you said "_that doesn't 
mean the List child length is constrained, it would mean the number of lists 
that the ListVector represented would be constrained_" I couldn't tell whether 
you were pointing at a mechanism in the codebase that sidesteps the 
single-ArrayBuffer limit on child storage — I didn't find one, so I focused on 
the wire-format parsing compliance angle, which is what a new commit I pushed 
addresses.
   
   The commit: `VectorLoader` now rebases offsets to 0 on load for `LargeList`, 
`LargeUtf8`, and `LargeBinary`. After rebasing, in-memory offsets are always 
bounded by the child buffer's element count (which fits in Number for anything 
the runtime can allocate), so downstream narrowing succeeds for any 
spec-conforming wire input — including sliced views with absolute, non-rebased 
offsets. Inputs whose offsets imply a child buffer larger than JS's 
`ArrayBuffer` cap now fail honestly at child-buffer allocation in `readData`, 
rather than later at offset narrowing — a cleaner failure mode that's a 
property of the JS runtime, not of the implementation.
   
   Please let me know if this commit addresses your primary concern.
   
   The remaining ceiling — JS's per-ArrayBuffer cap of ~2^32 bytes on a single 
contiguous child buffer — is allocation capacity, not interpretation, and it 
applies uniformly across List/LargeList/LargeUtf8/LargeBinary. Lifting it would 
mean a chunked-children redesign (child as Vector<U> rather than single 
`Data<U>`), which is a substantial design change. As I previously expressed, I 
think that's a separate, more ambitious effort and deserves its own issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to