Hi,

The data I am working on have the following schema:

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Profile",
"fields": [
{"name": "nest1", "type": [{ "type": "array", "items": "Nest1"}, "null"]},
{"name": "nest2", "type": [{ "type": "array", "items": "Nest2"}, "null"]},
{"name": "idd", "type": "long"}
}

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest1",
"fields": [
{"name": "ts", "type": "int"},
{"name": "mpi", "type": "int"},
{"name": "api1", "type": ["int", "null"], "default": -1},
{"name": "api2", "type": ["long", "null"], "default": -1},
{"name": "api3", "type": ["int", "null"], "default": -1}
}

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest2",
"fields": [
{"name": "nest1", "type": "Nest1"},
{"name": "ts", "type": "int"} ]
}

Basically there are two nested tables, nest1 and nest2 in the table,
Profile. In Nest2, it has a member of type Nest1. However, when I try to
get access to the data with the following projection schema:

{
  "type": "record",
  "name": "Profile",
  "namespace": "avro.parquet.model",
  "fields": [
    {
      "name": "nest2",
      "type": [
        {
          "type": "array",
          "items": {
            "type": "record",
            "name": "Nest2",
            "fields": [
              {
                "name": "nest1",
                "type": {
                  "type": "record",
                  "name": "Nest1",
                  "fields": [
                    {
                      "name": "mpi",
                      "type": "int"
                    },
                    {
                      "name": "ts",
                      "type": "int"
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  ]
}

But I get the error message,

org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch.
Avro field 'mpi' not found.
at
org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:133)
....


When checking the code, I found that the function,
org.apache.parquet.avro.AvroIndexedRecordConverter.AvroArrayConverter.isElementType(Type,
Schema) returns false, though these two parameters (i.e., type and schema)
are compatible. I am wondering if it is a bug, or I missed something here.

Thanks,
Yan

Reply via email to