[GitHub] [arrow-rs] rtyler opened a new issue, #3875: Is it possible to write a map?

via GitHub Thu, 16 Mar 2023 08:08:52 -0700


rtyler opened a new issue, #3875:
URL: https://github.com/apache/arrow-rs/issues/3875


   **Which part is this question about**
   
   I have been struggling to make `RecordBatch` objects with what I believe are 
maps. I cannot seem to create the right icantation to write a record which 
includes a map.
   
   **Describe your question**
   
   I have tried variables of `MapBuilder` and hand-crafting `MapArray`s but 
today I broke down and just tried to write a record with a JSON decoded object 
to parquet, for example:
   
   ```rust
       let json = r#"{"ds" : "1", "timestamp" : 1, "status" : 200, "url" : 
"https://";,
       "method" : "GET", "response" : "", "headers" : {"this" : "is a map"}}"#;
       use arrow::json::reader::{Decoder, DecoderOptions};
       use arrow::datatypes::Schema as ArrowSchema;
       // don't mind this, this is just some Delta Lake schema translation
       let schema: ArrowSchema = <ArrowSchema as 
TryFrom<&Schema>>::try_from(&HttpRecord::schema()).unwrap();
       let schema_ref = Arc::new(schema);
       let value: serde_json::Value = serde_json::from_str(json).unwrap();
   
       let vit = vec![value];
       let mut vit = vit.iter().map(|v| Ok(v.to_owned()));
   
       let options = DecoderOptions::new();
       let d = Decoder::new(schema_ref, options);
       let batch = d.next_batch(&mut vit).unwrap();
   ```
   
   The `batch` then looks like:
   
   ```
   RecordBatch { schema: Schema { fields: [Field { name: "ds", data_type: Utf8, 
nullable: false, dict_id: 0, dict_is_ordered: false,[32/742]
   a: {} }, Field { name: "timestamp", data_type: Timestamp(Microsecond, None), 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
   Field { name: "status", data_type: Int32, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "url", data_type: Utf
   8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "method", data_type: Utf8, nullable: false, dict_id: 0, dict
   _is_ordered: false, metadata: {} }, Field { name: "response", data_type: 
Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }
   , Field { name: "headers", data_type: Map(Field { name: "key_value", 
data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dic
   t_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", 
data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metad
   ata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: 
{} }, false), nullable: true, dict_id: 0, dict_is_ordered: false, meta
   data: {} }], metadata: {} }, columns: [StringArray
   [
     "1",
   ], PrimitiveArray<Timestamp(Microsecond, None)>
   [
     1970-01-01T00:00:00.000001,
   ], PrimitiveArray<Int32>
   [
     200,
   ], StringArray
   [
     "https://";,
   ], StringArray
   [
     "GET",
   ], StringArray
   [
     "",
   ], MapArray
   [
     StructArray
   [
   -- child 0: "key" (Utf8)
   StringArray
   [
     "this",
   ]
   -- child 1: "value" (Utf8)
   StringArray
   ]], row_count: 1 }
   ```
   
   Which when I attempt to write using `RecordBatchWriter` results in a panic:
   
   ```
   thread 'model::tests::zip_batches' panicked at 'not implemented: Take not 
supported for data type Map(Field { name: "key_value", data_type: Struct
   ([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Utf
   8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), 
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, fals
   e)', 
/home/tyler/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-select-33.0.0/src/t
   ake.rs:234:14
   ```
   
   I would appreciate any examples whether using JSON or raw constructed 
`RecordBatch` objects to get records with maps written properly to parquet 
:frowning: 
   
   **Additional context**
   
   This is with arrow 33 for what it's worth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] rtyler opened a new issue, #3875: Is it possible to write a map?

Reply via email to