You are hitting PARQUET-364 https://issues.apache.org/jira/browse/PARQUET-364

Although the symptoms are different, they share the same root cause.

Cheng

On 9/18/15 6:13 PM, Yan Qi wrote:
Hi,

The data I am working on have the following schema:

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Profile",
"fields": [
{"name": "nest1", "type": [{ "type": "array", "items": "Nest1"}, "null"]},
{"name": "nest2", "type": [{ "type": "array", "items": "Nest2"}, "null"]},
{"name": "idd", "type": "long"}
}

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest1",
"fields": [
{"name": "ts", "type": "int"},
{"name": "mpi", "type": "int"},
{"name": "api1", "type": ["int", "null"], "default": -1},
{"name": "api2", "type": ["long", "null"], "default": -1},
{"name": "api3", "type": ["int", "null"], "default": -1}
}

{
"namespace": "avro.parquet.model",
"type": "record",
"name": "Nest2",
"fields": [
{"name": "nest1", "type": "Nest1"},
{"name": "ts", "type": "int"} ]
}

Basically there are two nested tables, nest1 and nest2 in the table,
Profile. In Nest2, it has a member of type Nest1. However, when I try to
get access to the data with the following projection schema:

{
   "type": "record",
   "name": "Profile",
   "namespace": "avro.parquet.model",
   "fields": [
     {
       "name": "nest2",
       "type": [
         {
           "type": "array",
           "items": {
             "type": "record",
             "name": "Nest2",
             "fields": [
               {
                 "name": "nest1",
                 "type": {
                   "type": "record",
                   "name": "Nest1",
                   "fields": [
                     {
                       "name": "mpi",
                       "type": "int"
                     },
                     {
                       "name": "ts",
                       "type": "int"
                     }
                   ]
                 }
               }
             ]
           }
         }
       ]
     }
   ]
}

But I get the error message,

org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch.
Avro field 'mpi' not found.
at
org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:133)
....


When checking the code, I found that the function,
org.apache.parquet.avro.AvroIndexedRecordConverter.AvroArrayConverter.isElementType(Type,
Schema) returns false, though these two parameters (i.e., type and schema)
are compatible. I am wondering if it is a bug, or I missed something here.

Thanks,
Yan


Reply via email to