[ 
https://issues.apache.org/jira/browse/PARQUET-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Stankevich updated PARQUET-1046:
---------------------------------------
    Description: 
If thrift class has a field with type List<some_enum> ParquetReader makes 
list's elements type as enum (type id = 16) but it has to make it Int32.

What happens is all fields that have field type as enum in thrift schema file 
in java class have field type as Int32. Same is true for List fields if list's 
elements are enum.

But when ParquetReader creates an object it uses type enum for list's elements 
instead of Int32.
Because of this fact we have an issue. We can not remove list field if it has 
enum elements. If we remove field like this from schema file but it will 
present in parquet file, when ParquetReader reads this field it tries to skip 
it because this field is not in the schema and it calls method 
TProtocolUtil.skip method with type = 15 for list and then it calls same method 
for each list element with type 16 for enum but TProtocolUtil.skip doesn't have 
this type in switch-case and it is not skipping list elements and because of 
this it throws exception later when it tries to skip List end.

  was:
If thrift class has a field with type List<some_enum> ParquetReader makes 
list's elements type as enum (type id = 16) but it has to make it Int32.

What happens is all fields that have field type as enum in thrift schema file 
in java class have field type as Int32. Same is true for List fields if list's 
elements are enum.

But when ParquetReader creates an object it uses type enum for list's elements 
instead of Int32.
Because of this fact we have an issue. We can not remove list field if it has 
enum elements. If we remove field like this from schema file but it will 
present in parquet file, when ParquetReader reads this field it tries to skip it
because this field is not in the schema and it calls method TProtocolUtil.skip 
method with type = 15 for list and then it calls same method for each list 
element with type 16 for enum but TProtocolUtil.skip doesn't have
this type in switch-case and it is not skipping list elements and because of 
this it throws exception later when it tries to skip List end.


> Impossible to read thrift object from parquet file if it has List<Enum> field 
> that was removed from thrift schema.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1046
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1046
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Andrei Stankevich
>
> If thrift class has a field with type List<some_enum> ParquetReader makes 
> list's elements type as enum (type id = 16) but it has to make it Int32.
> What happens is all fields that have field type as enum in thrift schema file 
> in java class have field type as Int32. Same is true for List fields if 
> list's elements are enum.
> But when ParquetReader creates an object it uses type enum for list's 
> elements instead of Int32.
> Because of this fact we have an issue. We can not remove list field if it has 
> enum elements. If we remove field like this from schema file but it will 
> present in parquet file, when ParquetReader reads this field it tries to skip 
> it because this field is not in the schema and it calls method 
> TProtocolUtil.skip method with type = 15 for list and then it calls same 
> method for each list element with type 16 for enum but TProtocolUtil.skip 
> doesn't have this type in switch-case and it is not skipping list elements 
> and because of this it throws exception later when it tries to skip List end.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to