Good afternoon,

I have some parquet data that was created using Avro Parquet Writer of
Java. This parquet includes the Avro Schema inside the key
parquet.avro.schema of the metadata, I want to convert this parquet data
back to Avro using this schema and programmed in Rust. I have tried the
following code, but I couldn't get this working:


    let mut inputFile  = File::open("test.parquet").unwrap();
    let builder =
ParquetRecordBatchReaderBuilder::try_new(inputFile).unwrap();

    let avroSchema =
builder.schema().metadata.get("parquet.avro.schema").unwrap();

    println!("Schema: {avroSchema}");
    let avroSchema = Schema::parse_str(avroSchema).unwrap();

    let mut reader = builder.build().unwrap();


    let jsonTemp = File::create("file.json").unwrap();
    // let buf: Vec<String> = Vec::new();
    let mut jsonWriter = arrow_json::LineDelimitedWriter::new(jsonTemp);
    for row in reader.into_iter() {
        jsonWriter.write(row.unwrap()).unwrap();
        jsonWriter.finish().unwrap();
    }

    let avroFile = File::create("res.avro").unwrap();
    let mut avroWriter = Writer::new(&avroSchema, avroFile);

    let jsonTemp = File::open("file.json").unwrap();
    for row in serde_json::Deserializer::from_reader(jsonTemp).into_iter() {
        let v: serde_json::Value = row.unwrap();
        let avroSchema = Schema::parse(&v).unwrap();
        let v: apache_avro::types::Value = v.into();

        println!("Valid: {}", v.validate(&avroSchema));
        //avroWriter.append(v).unwrap();
    }

    avroWriter.flush().unwrap();

Any idea or advice? Has anyone tried to do the same?

Thanks in advance!

Best regards,
David

Reply via email to