GitHub user dadepo closed a discussion: How to define Schema for embedded JSON 
fields

I am trying to get a better understanding how to use Schemas with Datafusion, 
so I have a JSON file with contents like

```
{"name":"Joe","status":{"role":"manager","salary":30000}}
```

I can define the Schema for it as read as follows

```
    let schema = Schema::new(vec![
        Field::new("name", DataType::Utf8, false),
        Field::new("status", DataType::Utf8, false),
    ]);

    let schema = Schema::new(vec![
        Field::new("name", DataType::Utf8, false),
        Field::new("status", DataType::Utf8, false),
    ]);

    let read_schemas = NdJsonReadOptions::default().schema(&schema);
    let df = ctx.read_json("data/schema.json", read_schemas).await?;
    df.show().await?;
```

Since the value of `status` column is a nested JSON object, I will like to 
define it's schema instead of representing it as a string via `DataType::Utf8`. 

So I tried the following schema

```
    let schema = Schema::new(vec![
        Field::new("name", DataType::Utf8, false),
        Field::new("status", DataType::Struct(vec![
            Field::new("role", DataType::Utf8, false),
            Field::new("salary", DataType::Int32, false),
        ]), false),
    ]);
``` 

But using this, the read no longer works. It fails with error

```
Error: ArrowError(JsonError("expected { got string"))
```

Which does not help much with debugging. 

Any ideas what I am doing wrong? And how do I indeed fully specify schema's for 
data structures that are not flat.




GitHub link: https://github.com/apache/datafusion/discussions/5985

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to