fotinosk opened a new pull request, #49658:
URL: https://github.com/apache/arrow/pull/49658

   ### Rationale for this change
   Resolves **#49505** enhancment request. 
   
   Previously, attempting to convert a Python sequence into a Dictionary array 
with large values resulted in the following behaviour: 
   
   ```
   >>> import pyarrow as pa
   >>> pa.array([], pa.dictionary(pa.int32(), pa.large_binary()))
   Traceback (most recent call last):
     ...
   ArrowNotImplementedError: DictionaryArray converter for type 
dictionary<values=large_binary, indices=int32, ordered=0> not implemented
   ```
   
   ### What changes are included in this PR?
   * **C++ Core (`cpp/src/arrow/util/converter.h`)**: 
     * Added `DICTIONARY_CASE(LargeBinaryType)` and 
`DICTIONARY_CASE(LargeStringType)` to the `MakeConverterImpl::Visit(const 
DictionaryType&)` dispatch table so the C++ core knows how to route the large 
types.
     * Updated the `PyDictionaryConverter` template for 
`enable_if_has_string_view` to use `std::string_view` in the `Append` method. 
This allows the underlying Arrow builders to handle size-dispatching (32-bit vs 
64-bit) internally.
   * **Python Tests (`python/pyarrow/tests/test_array.py`)**: 
     * Added `test_dictionary_large_string_and_binary` to verify sequence 
conversion for both `large_string` and `large_binary` dictionary types.
   
   ### Are these changes tested?
   Yes. Added `test_dictionary_large_string_and_binary` to 
`python/pyarrow/tests/test_array.py` which validates both the schema resolution 
and the data integrity of the resulting `pylist`.
   
   ### Are there any user-facing changes?
   Yes. Users can now pass `pa.large_string()` and `pa.large_binary()` as value 
types into `pa.dictionary()` when using `pa.array()` to ingest Python sequences.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to