[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

Laszlo Bodor (Jira) Wed, 11 Sep 2019 07:50:11 -0700


    [ 
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
 ]


Laszlo Bodor edited comment on ORC-539 at 9/11/19 2:49 PM:
-----------------------------------------------------------

small repro without partitions and with a single column

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

the problem is that on the internal branch ORC-531 cannot be found, which is 
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error

with this check the issue disappears (however I got result mismatch, still 
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs 
to be improved for testing float evolution



was (Author: abstractdog):
it fails for float and double too, simple repro which can be used with 
double/float source:
(1 column, no partitions)

{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean, 
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1 
decimal(38,18), float1 float, double1 double, string1 string, string2 string, 
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string, 
smallint_str string, int_str string, bigint_str string, decimal_str string, 
float_str string, double_str string, date_str string, timestamp_str string, 
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath 
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into 
table schema_evolution_data_n41;

CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);

insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM 
schema_evolution_data_n41;

alter table part_change_various_various_timestamp_n6 replace columns (c6 
TIMESTAMP);

select c6 from part_change_various_various_timestamp_n6;
{code}

> Exception in double to timestamp schema evolution
> -------------------------------------------------
>
>                 Key: ORC-539
>                 URL: https://issues.apache.org/jira/browse/ORC-539
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Laszlo Bodor
>            Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the 
> following exception in a test related to schema evolution from double to 
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file: 
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/000000_0
>         at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
>         ... 23 more
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
> for column 7 kind DATA position: 15 length: 15 range: 0 offset: 122 limit: 
> 122 range 0 = 0 to 15 uncompressed: 12 to 12
>         at 
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:125)
>         at 
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:108)
>         at 
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:104)
>         at 
> org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:783)
>         at 
> org.apache.orc.impl.ConvertTreeReaderFactory$TimestampFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1883)
>         at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2012)
>         at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1282)
>         ... 28 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (ORC-539) Exception in double to timestamp schema evolution

Reply via email to