jorisvandenbossche commented on issue #39010: URL: https://github.com/apache/arrow/issues/39010#issuecomment-1835649581
The reason that a list of tuples was used as default in the past is that a dict cannot always represent the data of a map array, because a map allows duplicate keys. But of course very often that is not an issue and you want dicts. And therefore a new keyword `maps_as_pydicts` was recently added (https://github.com/apache/arrow/pull/34730), but only to the conversion path to pandas. Using your example: ```python schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))]) data = [{'x': {'a': 1}}] batch = pa.RecordBatch.from_pylist(data, schema=schema) ``` this can now be converted to a pandas DataFrame (or numpy array) using dicts: ``` In [8]: batch.to_pandas(maps_as_pydicts="strict") Out[8]: x 0 {'a': 1} ``` However, the `to_pylist()` method takes a different code path, where this is not yet supported. Under the hood, this just calls `as_py()` on the individual MapScalar objects in a for loop. But I think it certainly make sense to add this same keyword to this scalar conversion code path as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
