pkrack commented on issue #46007:
URL: https://github.com/apache/arrow/issues/46007#issuecomment-4213757087

   I investigated this a bit further and  #46007, #45262, and #44853 seem to 
have the same underlying issue: recursive instantiation / conversion does not 
automatically unwrap to storage type then rewrap to the extension type.
   
   Top-level construction with `pa.array` has a special case in 
`python/pyarrow/array.pxi` [at line 
265](https://github.com/apache/arrow/blob/35fb62e6224617d5ae749533654e9f8c7a6250c7/python/pyarrow/array.pxi#L265).
   
   `ExtensionType`s nested in structured types do not go through that special 
case. Instead this case is handled in 
[`python/pyarrow/src/arrow/python/python_to_arrow.cc`](https://github.com/apache/arrow/blob/main/python/pyarrow/src/arrow/python/python_to_arrow.cc),
 where `PyConverterTrait` is not implemented for `ExtensionType`s. Because of 
this, `MakeConverter` in 
[`cpp/src/arrow/util/converter.h`](https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/converter.h)
 falls back to the generic `Visit` implementation which returns 
`Status::NotImplemented(t.name())` (cf. 
[`converter.h:L251`](https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/converter.h#L251)),
 which results in the observed `ArrowNotImplementedError: extension`.
   
   Related issue: there is no builder for extension types, i.e. you also can 
not automatically create arrays with nested extension types in C++ either with 
the Builder interface. The `Converter`s in `python_to_arrow.cc` typically use 
such a builder internally (see also `converter.h`).
   
   So basically what needs to be done here is using the "unwrap to storage type 
then rewrap" trick that is already used in different parts of the code base.
   The question is where this should happen:
   1. in a builder, then the converters use these builders and the top level 
special case can be removed. I.e new builder + converter implementation for 
extension types -> then nested extension types are also supported in C++
   2. in a converter, which then instantiates a builder for the storage type. 
I.e. new converter class -> nested extension types would only be supported in 
python.
   3. in the container types (list, map, etc.) -> requires changes to all 
container types. Perhaps some macro / template magic? This would be some more 
duplication but on the other hand it follows the idea which seems to be 
represented in the code base: consumers should work with the storage type.
   
   
   Workaround for python: construct the extension-typed child array explicitly 
first, then use the corresponding from_arrays / from_array constructor, for 
example: `pa.FixedSizeListArray.from_arrays(pa.array(["{'a':1}"], 
type=pa.json_()), type=pa.list_(pa.json_(), 1))`. Top level construction works, 
automatic recursive construction does not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to