jorisvandenbossche commented on issue #39010:
URL: https://github.com/apache/arrow/issues/39010#issuecomment-1835649581

   The reason that a list of tuples was used as default in the past is that a 
dict cannot always represent the data of a map array, because a map allows 
duplicate keys. 
   
   But of course very often that is not an issue and you want dicts. And 
therefore a new keyword `maps_as_pydicts` was recently added 
(https://github.com/apache/arrow/pull/34730), but only to the conversion path 
to pandas. 
   
   Using your example:
   
   ```python
   schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))])
   data = [{'x': {'a': 1}}]
   batch = pa.RecordBatch.from_pylist(data, schema=schema)
   ```
   
   this can now be converted to a pandas DataFrame (or numpy array) using dicts:
   
   ```
   In [8]: batch.to_pandas(maps_as_pydicts="strict")
   Out[8]: 
             x
   0  {'a': 1}
   ```
   
   However, the `to_pylist()` method takes a different code path, where this is 
not yet supported. Under the hood, this just calls `as_py()` on the individual 
MapScalar objects in a for loop. 
   But I think it certainly make sense to add this same keyword to this scalar 
conversion code path as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to