aboderinsamuel commented on PR #50145:
URL: https://github.com/apache/arrow/pull/50145#issuecomment-4685480173

   Thanks @AlenkaF! Happy to dig in, agreed these are better hashed out up 
front. My take, keeping this PR scoped to FixedShapeTensor and treating the 
rest as follow-ups:
   
   1. Other canonical extension types
   I think the same idea fits them, but the right pandas dtype isn't always 
ArrowDtype. pd.ArrowDtype(self) is a good generic default (faithful, 
round-trips, no extra plumbing), yet some types have a more natural mapping 
worth weighing case by case, e.g. bool8 → a boolean dtype, while 
json/uuid/opaque could stay ArrowDtype. So I'd lean toward implementing it per 
canonical type (defaulting to ArrowDtype, overriding where a native dtype is 
clearly better) rather than one blanket implementation on BaseExtensionType. 
That keeps each behavior change small and reviewable, and avoids silently 
changing to_pandas for user-defined extension types — which feels more like the 
territory of the as_py fallback in 
[#33134](https://github.com/aboderinsamuel/arrow/issues/33134).
   
   2. Implications for to_pandas
   Returning a dtype with __from_arrow__ does change to_pandas/Table.to_pandas 
for that type: instead of the current fallback (storage → object/numpy; e.g. 
tensors become an object column of flattened ndarrays), you get a faithful 
extension-typed column. What I see:
   
   ✅ round-trips: to_pandas → from_pandas now preserves the extension type 
(object dtype loses it).
   ✅ split_blocks=True works instead of raising KeyError (the original bug).
   ⚠️ user-facing change: anyone relying on the old object/numpy output now 
gets ArrowDtype — needs a changelog note.
   types_mapper still takes precedence (it's checked before to_pandas_dtype in 
both _get_extension_dtypes and _array_like_to_pandas), so explicit user 
mappings are unaffected.
   Only engages on pandas ≥ 2.1 (reliable ArrowDtype extension blocks, 
GH-35821); older pandas keeps today's fallback. Per-type rollout + the version 
gate keep the blast radius controlled.
   3. Docstrings
   Agreed, worth doing. BaseExtensionType and its subclasses currently inherit 
to_pandas_dtype (and friends) from DataType with no mention of extension 
behavior. I'd be glad to take a follow-up documenting the 
to_pandas_dtype/to_pandas behavior on BaseExtensionType and per-type specifics.
   
   If it helps, I can open a tracking issue capturing this broader direction 
(canonical-type mappings + docstring cleanup) so it isn't buried here, and keep 
this PR focused on FixedShapeTensor. Does that split sound right to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to