Good afternoon,
I have some parquet data that was created using Avro Parquet Writer of
Java. This parquet includes the Avro Schema inside the key
parquet.avro.schema of the metadata, I want to convert this parquet data
back to Avro using this schema and programmed in Rust. I have tried the
following code, but I couldn't get this working:
let mut inputFile = File::open("test.parquet").unwrap();
let builder =
ParquetRecordBatchReaderBuilder::try_new(inputFile).unwrap();
let avroSchema =
builder.schema().metadata.get("parquet.avro.schema").unwrap();
println!("Schema: {avroSchema}");
let avroSchema = Schema::parse_str(avroSchema).unwrap();
let mut reader = builder.build().unwrap();
let jsonTemp = File::create("file.json").unwrap();
// let buf: Vec<String> = Vec::new();
let mut jsonWriter = arrow_json::LineDelimitedWriter::new(jsonTemp);
for row in reader.into_iter() {
jsonWriter.write(row.unwrap()).unwrap();
jsonWriter.finish().unwrap();
}
let avroFile = File::create("res.avro").unwrap();
let mut avroWriter = Writer::new(&avroSchema, avroFile);
let jsonTemp = File::open("file.json").unwrap();
for row in serde_json::Deserializer::from_reader(jsonTemp).into_iter() {
let v: serde_json::Value = row.unwrap();
let avroSchema = Schema::parse(&v).unwrap();
let v: apache_avro::types::Value = v.into();
println!("Valid: {}", v.validate(&avroSchema));
//avroWriter.append(v).unwrap();
}
avroWriter.flush().unwrap();
Any idea or advice? Has anyone tried to do the same?
Thanks in advance!
Best regards,
David