alamb commented on PR #9220: URL: https://github.com/apache/arrow-rs/pull/9220#issuecomment-3799568377
> > Why wouldn't we just pack to Utf8View directly? > > Good question @alamb - Packing directly to Utf8View would require a dictionary builder for view types (dedup and incremental construction over a block/buffer-indexed view layout, plus view invariants like prefix/offset correctness). I think you can use `StringViewBuilder` to do this: https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html The one thing that might be tricky is knowing what the pre-existing index was Specifically: https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html#method.with_deduplicate_strings I am not sure what you mean by "invariants like prefix/offset correctness" > > The two step path reuses the existing Dictionary(K, Utf8/Binary) packing, then reuses the existing cast machinery to produce Dictionary(K, Utf8View/BinaryView). When the dictionary values are Utf8/Binary and offsets fit, the cast can build views over the existing values buffer via append_block/view_from_dict_values (no extra value buffer copy). > > For Large* / oversized values, it can fall back to the general cast path (potentially copying into view blocks) instead of trying to force a zero copy view representation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
