Dave Hirschfeld created ARROW-5568: -------------------------------------- Summary: [Python] Allow parsing more general JSON formats Key: ARROW-5568 URL: https://issues.apache.org/jira/browse/ARROW-5568 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Dave Hirschfeld
I have JSON data where the columnar (line-delimited) part is in a `data` subkey: {code:java} { "metadata": {"name": "block1"}, "data" : [ {"a": 1, "b": 2.0, "c": "foo", "d": false}, {"a": 4, "b": -5.5, "c": null, "d": true} ] } {code} It would be good if the arrow JSON parser could allow specifying where the columnar data is stored. Since the `metadata` is also important to me it would be even better if the rest of the JSON could be returned as a Python dict with the only the specified keys parsed as arrow tables - e.g. {code:java} >>> block1 = json.read_json(fn, tables=['data']) >>> block1['data'] pyarrow.Table a: int64 b: double c: string d: bool >>> block1['metadata'] {'name': 'block1'} >>> block1 { "metadata": {"name": "block1"}, "data" : pyarrow.Table }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)