Karakatiza666 opened a new pull request, #438: URL: https://github.com/apache/arrow-js/pull/438
This PR was co-authored with [Claude Code](https://claude.com/claude-code). --- ## Summary This PR builds on an unresolved https://github.com/apache/arrow-js/pull/299 to implement full support for the `LargeList` data type in Apache Arrow JavaScript bindings. LargeList uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB. Where possible, the code size was reduced by distilling helpers used in both `List` and `LargeList`. ## Related Issues Closes #70 ## Implementation Details ### Core Type System - Added `Type.LargeList = 21` enum value - Implemented `LargeList<T>` class with `BigInt64Array` offset support - Added `DataType.isLargeList()` type guard - Added `LargeListDataProps` interface and `MakeDataVisitor.visitLargeList` (widens 32-bit offsets via `toBigInt64Array`) - Mapped `LargeList` and `LargeListBuilder` into `TypeToDataType`, `TypeToBuilder`, and `DataTypeToBuilder` in `interfaces.ts` ### Visitor Pattern Implementation Wired `visitLargeList()` across every visitor, factoring shared helpers where the offset width was the only difference: - `GetVisitor` / `SetVisitor`: merged `getList` / `setList` into single helpers using `bigIntToNumber` at the offset boundary — one implementation covers both List and LargeList - `IteratorVisitor`, `IndexOfVisitor`: register `visitLargeList` (the generic implementations are offset-width agnostic) - `TypeComparator`: widened compareList to `List | LargeList` (structural comparison only) - `VectorAssembler`: generalized `assembleListVector` to coerce begin/end via `bigIntToNumber`; registers `visitLargeList` - `VectorLoader`: `visitLargeList` mirrors `visitList`; base `readOffsets` already honors `OffsetArrayType` (`BigInt64Array`) - `JSONVectorAssembler`: emits `OFFSET` via `bigNumsToStrings`, matching the `LargeUtf8` / `LargeBinary` pattern - `TypeAssembler` / `JSONTypeAssembler`: `FlatBuffers` + JSON type serialization ### IPC Support - `ipc/metadata/message.ts`: `decodeFieldType` handles `Type.LargeList` - Read and write paths both round-trip via the assembler/loader registrations above ### Latent Bug Fix - `util/buffer.ts`: `rebaseValueOffsets` now coerces its number offset to `BigInt` when the offsets array is `BigInt64Array`. Previously a non-zero offset on a 64-bit offsets array would `TypeError` on bigint += number — required for `LargeList` IPC writes on sliced data, and also fixes the same latent issue for `LargeUtf8` / `LargeBinary`. ### Builders - New `src/builder/largelist.ts` (`LargeListBuilder`), mirroring `ListBuilder` with `BigInt()` for offset accumulation and `Number()` coercion when passing the start index to `child.set` - Widened `VariableWidthBuilder` bound to include `LargeList` in `builder.ts` - `GetBuilderCtor.visitLargeList` returns `LargeListBuilder` ### Testing - `test/generate-test-data.ts`: - Factored a shared `generateListLike` helper used by both `generateList` (`Int32`) and `generateLargeList` (`BigInt64`) - Added `createVariableWidthOffsets64`; truncates `min` / `max` at entry so fractional stride from `childVec.length / (length - nullCount)` doesn't `RangeError` in `BigInt()` - `test/unit/generated-data-tests.ts`: `LargeList` added to the matrix - `test/unit/builders/builder-tests.ts`: `LargeListBuilder` entry added alongside `ListBuilder` / `FixedSizeListBuilder` / `MapBuilder` - `test/unit/visitor-tests.ts`: `visitLargeList` added to `BasicVisitor` / `FeatureVisitor` and to both describe matrices ### Public API - Exported `LargeList` and `LargeListBuilder` from `src/Arrow.ts` and `src/Arrow.dom.ts` ## Test Plan All existing tests continue to pass, plus the `LargeList` path is exercised by: - ✅ Generated-data matrix: `get` / `set` / `iterator` / `indexOf` / `slice` / `concat` / IPC round-trip - ✅ Builder matrix: no-nulls / with-nulls / length=518 - ✅ Visitor dispatch (`BasicVisitor` + `FeatureVisitor`) - ✅ IPC stream round-trip (16 IPC suites green, including JSON form via `JSONVectorAssembler` / `JSONVectorLoader`) All tests across 45 suites pass. The tests were run with: ```bash npx jest --config jestconfigs/jest.src.config.js ``` ## Checklist - [x] Implementation follows existing code patterns - [x] All visitor methods implemented (`get` / `set` / `iterator` / `indexOf` / `TypeComparator` / `VectorAssembler` / `VectorLoader` / `JSONVectorAssembler` / `TypeAssembler` / `JSONTypeAssembler`) - [x] IPC serialization/deserialization support added (binary + JSON form) - [x] `LargeListBuilder` added and wired through `GetBuilderCtor` + `interfaces.ts` - [x] Latent `rebaseValueOffsets` bigint bug fixed - [x] Comprehensive tests added using existing test framework - [x] All tests passing - [x] Public API exports added - [x] No breaking changes ## Notes - This implementation provides full `LargeList` support: IPC read/write (binary + JSON form), in-memory access and mutation, type comparison, and construction via `LargeListBuilder` — parallel to the existing List type, just with 64-bit offsets. - Storage and wire format are honest 64-bit (`BigInt64Array` end-to-end). The only narrowing happens at JS-runtime boundaries where `Data.slice` accepts number — identical to the `LargeUtf8` / `LargeBinary` policy upstream - Helpers were merged across `List`/`LargeList` only where the offset width was the sole difference and `bigIntToNumber` coercion at the boundary made the merge non-confusing; `LargeListBuilder` stays separate because the `BigInt()` / `Number()` coercions in `_flushPending` would obscure a merged version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
