Hi,
On Sun, Feb 26, 2023 at 9:03 PM David Lacalle Castillo <
[email protected]> wrote:
> Good afternoon,
>
> I have some parquet data that was created using Avro Parquet Writer of
> Java. This parquet includes the Avro Schema inside the key
> parquet.avro.schema of the metadata, I want to convert this parquet data
> back to Avro using this schema and programmed in Rust. I have tried the
> following code, but I couldn't get this working:
>
>
> let mut inputFile = File::open("test.parquet").unwrap();
> let builder =
> ParquetRecordBatchReaderBuilder::try_new(inputFile).unwrap();
>
> let avroSchema =
> builder.schema().metadata.get("parquet.avro.schema").unwrap();
>
> println!("Schema: {avroSchema}");
> let avroSchema = Schema::parse_str(avroSchema).unwrap();
>
This is the Avro schema!
>
> let mut reader = builder.build().unwrap();
>
>
> let jsonTemp = File::create("file.json").unwrap();
> // let buf: Vec<String> = Vec::new();
> let mut jsonWriter = arrow_json::LineDelimitedWriter::new(jsonTemp);
> for row in reader.into_iter() {
> jsonWriter.write(row.unwrap()).unwrap();
> jsonWriter.finish().unwrap();
> }
>
> let avroFile = File::create("res.avro").unwrap();
> let mut avroWriter = Writer::new(&avroSchema, avroFile);
>
> let jsonTemp = File::open("file.json").unwrap();
> for row in serde_json::Deserializer::from_reader(jsonTemp).into_iter()
> {
> let v: serde_json::Value = row.unwrap();
> let avroSchema = Schema::parse(&v).unwrap();
>
This seems wrong!
&v is a row/record, not a schema.
> let v: apache_avro::types::Value = v.into();
>
> println!("Valid: {}", v.validate(&avroSchema));
> //avroWriter.append(v).unwrap();
> }
>
> avroWriter.flush().unwrap();
>
> Any idea or advice? Has anyone tried to do the same?
>
Please share a demo application which we could use to debug the problem.
E.g. a Github project.
>
> Thanks in advance!
>
> Best regards,
> David
>