[
https://issues.apache.org/jira/browse/ARROW-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488936#comment-17488936
]
Joris Van den Bossche commented on ARROW-15613:
-----------------------------------------------
(Side note: this might be just for quick testing, but if you actually want to
use the extension type on the rust side as well, you should probably define the
extension type in Python as a subclass of {{pyarrow.ExtensionType}}, and not
{{pyarrow.PyExtensionType}}, since the latter uses a pickle dump of the class
as the serialized metadata, which you won't be able to use in Rust, I suppose)
> [C++][Python] Metadata from C data interface is not valid utf8
> --------------------------------------------------------------
>
> Key: ARROW-15613
> URL: https://issues.apache.org/jira/browse/ARROW-15613
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Jorge Leitão
> Priority: Major
>
> While trying to roundtrip an extension from schema.metadata (see ARROW-13855
> for details), I got invalid utf8, which imo goes against
> > A binary string describing the type’s metadata [1]
> Specifically, a field
> field = pyarrow.field("aa", UuidType())
> contains the following:
> ```
> key len: 20
> key: "ARROW:extension:name"
> value len: 23
> value: "arrow.py_extension_type"
> key len: 24
> key: "ARROW:extension:metadata"
> value len: 28
> ```
> with the value's data for this key being:
> ```
> [128, 3, 99, 116, 101, 115, 116, 95, 115, 113, 108, 10, 85, 117, 105, 100,
> 84, 121, 112, 101, 10, 113, 0, 41, 82, 113, 1, 46]
> ```
> This is not a valid utf8 (see e.g.
> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=02b67658b3cddf8dc095bc9750fa7032).
> Maybe I am reading the values incorrectly? (null point?)
> [1]
> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
--
This message was sent by Atlassian Jira
(v8.20.1#820001)