GitHub user dadepo closed a discussion: How to define Schema for embedded JSON
fields
I am trying to get a better understanding how to use Schemas with Datafusion,
so I have a JSON file with contents like
```
{"name":"Joe","status":{"role":"manager","salary":30000}}
```
I can define the Schema for it as read as follows
```
let schema = Schema::new(vec![
Field::new("name", DataType::Utf8, false),
Field::new("status", DataType::Utf8, false),
]);
let schema = Schema::new(vec![
Field::new("name", DataType::Utf8, false),
Field::new("status", DataType::Utf8, false),
]);
let read_schemas = NdJsonReadOptions::default().schema(&schema);
let df = ctx.read_json("data/schema.json", read_schemas).await?;
df.show().await?;
```
Since the value of `status` column is a nested JSON object, I will like to
define it's schema instead of representing it as a string via `DataType::Utf8`.
So I tried the following schema
```
let schema = Schema::new(vec![
Field::new("name", DataType::Utf8, false),
Field::new("status", DataType::Struct(vec![
Field::new("role", DataType::Utf8, false),
Field::new("salary", DataType::Int32, false),
]), false),
]);
```
But using this, the read no longer works. It fails with error
```
Error: ArrowError(JsonError("expected { got string"))
```
Which does not help much with debugging.
Any ideas what I am doing wrong? And how do I indeed fully specify schema's for
data structures that are not flat.
GitHub link: https://github.com/apache/datafusion/discussions/5985
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]