oleksandr-yatsuk commented on issue #14116:
URL: https://github.com/apache/arrow/issues/14116#issuecomment-1249388533

   Java DTO
   
   ```
   public class VideoScoreUpdated {
       public String id;
       public Map<String, String> tags;
   }
   ```
   serializes to JSON as an object 
   ```
   {
   "id": 1,
       "tags": {
           "tag1": "value1",
           "tag2": "value2"
       }
   }
   ```
   in python it deserializes as `dict` object, not as an array of tuples
   
   ```
   {
   "id": 1,
       "tags": {
           "tag1": "value1",
           "tag2": "value2"
       }
   }
   ```
   
   Normally `Map` field of any language (Java, Scala, C#, etc) serializes into 
JSON as an object, not as an array of JSON tuples/objects.
   That's why I would expect that in pyarrow it works the same way: `JSON 
object -> python dict -> pyarrow map`
   Manually converting python `dict` into `array of tuples` is a super hard job.
   
   The optional option would be with the help of pyarrow schema control which 
`dict` field will be a map or a struct.
   
   For example:
   
   ```
   tags_updated = {
       "id": 1,
       "tags": {
           "tag1": "value1",
           "tag2": "value2"
       },
      "user": {
          "id": "user-1",
          "country": "ES"
       }
   }
   
   tags_updated_schema = schema(
      field("id", string(), False),
      field("tags", map_(string(), string()), False),
      field("user, struct([
         field("id", string(), False),
         field("country", string(), False)
       ]), False)
   )
   ```
   
   Correct me if I'm wrong, but the only option to create pyarrow schema on 
python `dict` field is a pyarrow `struct` only?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to