[ 
https://issues.apache.org/jira/browse/ARROW-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488916#comment-17488916
 ] 

Antoine Pitrou commented on ARROW-15613:
----------------------------------------

There is actually a discussion to relax the utf8 requirement in IPC metadata 
values (see the message recently posted by [~jorisvandenbossche]  "Re: 
[DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - 
Dynamic schema changes in ArrowFlight streams").

In short: yes, Arrow C++ and PyArrow can put arbitrary binary data in metadata 
values.

Also cc [~lidavidm]  [~emkornfield] 

> [C++][Python] Metadata from C data interface is not valid utf8
> --------------------------------------------------------------
>
>                 Key: ARROW-15613
>                 URL: https://issues.apache.org/jira/browse/ARROW-15613
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Jorge Leitão
>            Priority: Major
>
> While trying to roundtrip an extension from schema.metadata (see ARROW-13855 
> for details), I got invalid utf8, which imo goes against
> > A binary string describing the type’s metadata [1]
> Specifically, a field
> field = pyarrow.field("aa", UuidType())
> contains the following:
> ```
> key len: 20
> key: "ARROW:extension:name"
> value len: 23
> value: "arrow.py_extension_type"
> key len: 24
> key: "ARROW:extension:metadata"
> value len: 28
> ```
> with the value's data for this key being:
> ```
> [128, 3, 99, 116, 101, 115, 116, 95, 115, 113, 108, 10, 85, 117, 105, 100, 
> 84, 121, 112, 101, 10, 113, 0, 41, 82, 113, 1, 46]
> ```
> This is not a valid utf8 (see e.g. 
> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=02b67658b3cddf8dc095bc9750fa7032).
> Maybe I am reading the values incorrectly? (null point?)
> [1] 
> https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to