rtyler opened a new issue, #3875:
URL: https://github.com/apache/arrow-rs/issues/3875
**Which part is this question about**
I have been struggling to make `RecordBatch` objects with what I believe are
maps. I cannot seem to create the right icantation to write a record which
includes a map.
**Describe your question**
I have tried variables of `MapBuilder` and hand-crafting `MapArray`s but
today I broke down and just tried to write a record with a JSON decoded object
to parquet, for example:
```rust
let json = r#"{"ds" : "1", "timestamp" : 1, "status" : 200, "url" :
"https://",
"method" : "GET", "response" : "", "headers" : {"this" : "is a map"}}"#;
use arrow::json::reader::{Decoder, DecoderOptions};
use arrow::datatypes::Schema as ArrowSchema;
// don't mind this, this is just some Delta Lake schema translation
let schema: ArrowSchema = <ArrowSchema as
TryFrom<&Schema>>::try_from(&HttpRecord::schema()).unwrap();
let schema_ref = Arc::new(schema);
let value: serde_json::Value = serde_json::from_str(json).unwrap();
let vit = vec![value];
let mut vit = vit.iter().map(|v| Ok(v.to_owned()));
let options = DecoderOptions::new();
let d = Decoder::new(schema_ref, options);
let batch = d.next_batch(&mut vit).unwrap();
```
The `batch` then looks like:
```
RecordBatch { schema: Schema { fields: [Field { name: "ds", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false,[32/742]
a: {} }, Field { name: "timestamp", data_type: Timestamp(Microsecond, None),
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "status", data_type: Int32, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "url", data_type: Utf
8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "method", data_type: Utf8, nullable: false, dict_id: 0, dict
_is_ordered: false, metadata: {} }, Field { name: "response", data_type:
Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }
, Field { name: "headers", data_type: Map(Field { name: "key_value",
data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dic
t_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metad
ata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata:
{} }, false), nullable: true, dict_id: 0, dict_is_ordered: false, meta
data: {} }], metadata: {} }, columns: [StringArray
[
"1",
], PrimitiveArray<Timestamp(Microsecond, None)>
[
1970-01-01T00:00:00.000001,
], PrimitiveArray<Int32>
[
200,
], StringArray
[
"https://",
], StringArray
[
"GET",
], StringArray
[
"",
], MapArray
[
StructArray
[
-- child 0: "key" (Utf8)
StringArray
[
"this",
]
-- child 1: "value" (Utf8)
StringArray
]], row_count: 1 }
```
Which when I attempt to write using `RecordBatchWriter` results in a panic:
```
thread 'model::tests::zip_batches' panicked at 'not implemented: Take not
supported for data type Map(Field { name: "key_value", data_type: Struct
([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Utf
8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]),
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, fals
e)',
/home/tyler/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-select-33.0.0/src/t
ake.rs:234:14
```
I would appreciate any examples whether using JSON or raw constructed
`RecordBatch` objects to get records with maps written properly to parquet
:frowning:
**Additional context**
This is with arrow 33 for what it's worth.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]