Hi,
The data I am working on have the following schema:
{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Profile",
"fields": [
{"name": "nest1", "type": [{ "type": "array", "items": "Nest1"},
"null"]},
{"name": "nest2", "type": [{ "type": "array", "items": "Nest2"},
"null"]},
{"name": "idd", "type": "long"}
}
{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest1",
"fields": [
{"name": "ts", "type": "int"},
{"name": "mpi", "type": "int"},
{"name": "api1", "type": ["int", "null"], "default": -1},
{"name": "api2", "type": ["long", "null"], "default": -1},
{"name": "api3", "type": ["int", "null"], "default": -1}
}
{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest2",
"fields": [
{"name": "nest1", "type": "Nest1"},
{"name": "ts", "type": "int"} ]
}
Basically there are two nested tables, nest1 and nest2 in the table,
Profile. In Nest2, it has a member of type Nest1. However, when I try to
get access to the data with the following projection schema:
{
"type": "record",
"name": "Profile",
"namespace": "avro.parquet.model",
"fields": [
{
"name": "nest2",
"type": [
{
"type": "array",
"items": {
"type": "record",
"name": "Nest2",
"fields": [
{
"name": "nest1",
"type": {
"type": "record",
"name": "Nest1",
"fields": [
{
"name": "mpi",
"type": "int"
},
{
"name": "ts",
"type": "int"
}
]
}
}
]
}
}
]
}
]
}
But I get the error message,
org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema
mismatch.
Avro field 'mpi' not found.
at
org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:133)
....
When checking the code, I found that the function,
org.apache.parquet.avro.AvroIndexedRecordConverter.AvroArrayConverter.isElementType(Type,
Schema) returns false, though these two parameters (i.e., type and
schema)
are compatible. I am wondering if it is a bug, or I missed something
here.
Thanks,
Yan