Dave Hirschfeld created ARROW-5568:
--------------------------------------

             Summary: [Python] Allow parsing more general JSON formats
                 Key: ARROW-5568
                 URL: https://issues.apache.org/jira/browse/ARROW-5568
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Dave Hirschfeld


I have JSON data where the columnar (line-delimited) part is in a `data` subkey:
{code:java}
{
  "metadata": {"name": "block1"},
  "data" : [
    {"a": 1, "b": 2.0, "c": "foo", "d": false},
    {"a": 4, "b": -5.5, "c": null, "d": true}
  ]
}
{code}
 

 

It would be good if the arrow JSON parser could allow specifying where the 
columnar data is stored.

Since the `metadata` is also important to me it would be even better if the 
rest of the JSON could be returned as a Python dict with the only the specified 
keys parsed as arrow tables - e.g.

 
{code:java}
>>> block1 = json.read_json(fn, tables=['data'])
>>> block1['data']
pyarrow.Table
a: int64
b: double
c: string
d: bool
>>> block1['metadata']
{'name': 'block1'}
>>> block1
{
  "metadata": {"name": "block1"},
  "data" : pyarrow.Table
}{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to