fotinosk opened a new pull request, #49658:
URL: https://github.com/apache/arrow/pull/49658
### Rationale for this change
Resolves **#49505** enhancment request.
Previously, attempting to convert a Python sequence into a Dictionary array
with large values resulted in the following behaviour:
```
>>> import pyarrow as pa
>>> pa.array([], pa.dictionary(pa.int32(), pa.large_binary()))
Traceback (most recent call last):
...
ArrowNotImplementedError: DictionaryArray converter for type
dictionary<values=large_binary, indices=int32, ordered=0> not implemented
```
### What changes are included in this PR?
* **C++ Core (`cpp/src/arrow/util/converter.h`)**:
* Added `DICTIONARY_CASE(LargeBinaryType)` and
`DICTIONARY_CASE(LargeStringType)` to the `MakeConverterImpl::Visit(const
DictionaryType&)` dispatch table so the C++ core knows how to route the large
types.
* Updated the `PyDictionaryConverter` template for
`enable_if_has_string_view` to use `std::string_view` in the `Append` method.
This allows the underlying Arrow builders to handle size-dispatching (32-bit vs
64-bit) internally.
* **Python Tests (`python/pyarrow/tests/test_array.py`)**:
* Added `test_dictionary_large_string_and_binary` to verify sequence
conversion for both `large_string` and `large_binary` dictionary types.
### Are these changes tested?
Yes. Added `test_dictionary_large_string_and_binary` to
`python/pyarrow/tests/test_array.py` which validates both the schema resolution
and the data integrity of the resulting `pylist`.
### Are there any user-facing changes?
Yes. Users can now pass `pa.large_string()` and `pa.large_binary()` as value
types into `pa.dictionary()` when using `pa.array()` to ingest Python sequences.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]