[ 
https://issues.apache.org/jira/browse/ORC-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972836#comment-16972836
 ] 

Shardul Mahadik commented on ORC-556:
-------------------------------------

I am hitting this issue where my read schema has some extra attributes but the 
file schema does not. This results in ORC trying to evolve the schema but then 
fails as the categories of both reader and file schema are the same (int). Do 
we have a solution for this?

> ConvertTreeReader can incorrectly be applied on columns of the same primitive 
> type
> ----------------------------------------------------------------------------------
>
>                 Key: ORC-556
>                 URL: https://issues.apache.org/jira/browse/ORC-556
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.6.0, 1.6.1
>            Reporter: Ratandeep Ratti
>            Priority: Major
>
> I'm seeing the following exception when reading old ORC data with Iceberg
> {noformat}
> 0.0 in stage 0.0 (TID 0, executor 1): java.lang.IllegalArgumentException: No 
> conversion of type INT to self needed
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.ConvertTreeReaderFactory.createAnyIntegerConvertTreeReader(ConvertTreeReaderFactory.java:1659)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.ConvertTreeReaderFactory.createConvertTreeReader(ConvertTreeReaderFactory.java:2112)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2327)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1957)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2367)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1957)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2367)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:230)
>       at 
> org.apache.iceberg.shaded.org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:741)
>       at 
> org.apache.iceberg.orc.OrcIterable.newOrcIterator(OrcIterable.java:87)
>       at org.apache.iceberg.orc.OrcIterable.iterator(OrcIterable.java:72)
>       at 
> org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:470)
>       at 
> org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:422)
>       at 
> org.apache.iceberg.spark.source.Reader$TaskDataReader.<init>(Reader.java:356)
>       at 
> org.apache.iceberg.spark.source.Reader$ReadTask.createPartitionReader(Reader.java:305)
>       at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> {noformat}
> I think the problem lies in the following snippet in method  
> {{org.apache.orc.impl.TreeReaderFactory#createTreeReader}}
> {code}
> if (!fileType.equals(readerType) &&
>     ... // elided)) {
>       ...
> }
> {code}
> We are doing an equals comparison on the {{TypeDescription}} class. This 
> equals comparison can now fail for at least 2 reasons
> # Reader schema has annotations [properties] and old file schema does not
> # Reader schema field name does not match in case with the file schema. This, 
> I suspect, is because the old data was written by Hive.
> At least 1 can be fixed if we change 
> {code}
> fileType.equals(readerType) => 
> fileType.getCategory().equals(readerType.getCategory()) 
> {code}
> I'm currently unsure of the repercussions of this so haven't made this change 
> myself.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to