[
https://issues.apache.org/jira/browse/ORC-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927546#comment-16927546
]
Laszlo Bodor edited comment on ORC-539 at 9/11/19 2:49 PM:
-----------------------------------------------------------
small repro without partitions and with a single column
{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean,
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1
decimal(38,18), float1 float, double1 double, string1 string, string2 string,
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string,
smallint_str string, int_str string, bigint_str string, decimal_str string,
float_str string, double_str string, date_str string, timestamp_str string,
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into
table schema_evolution_data_n41;
CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);
insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM
schema_evolution_data_n41;
alter table part_change_various_various_timestamp_n6 replace columns (c6
TIMESTAMP);
select c6 from part_change_various_various_timestamp_n6;
{code}
the problem is that on the internal branch ORC-531 cannot be found, which is
responsible for handling float / double types in the convert tree reader:
https://github.com/apache/orc/blame/master/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L828-L830
so it probably tries to read float as it was double, hence the error
with this check the issue disappears (however I got result mismatch, still
checking), but I think TestSchemaEvolution#testEvolutionToTimestamp still needs
to be improved for testing float evolution
was (Author: abstractdog):
it fails for float and double too, simple repro which can be used with
double/float source:
(1 column, no partitions)
{code}
CREATE TABLE schema_evolution_data_n41(insert_num int, boolean1 boolean,
tinyint1 tinyint, smallint1 smallint, int1 int, bigint1 bigint, decimal1
decimal(38,18), float1 float, double1 double, string1 string, string2 string,
date1 date, timestamp1 timestamp, boolean_str string, tinyint_str string,
smallint_str string, int_str string, bigint_str string, decimal_str string,
float_str string, double_str string, date_str string, timestamp_str string,
filler string)
row format delimited fields terminated by '|' stored as textfile;
load data local inpath
'../../data/files/schema_evolution/schema_evolution_data.txt' overwrite into
table schema_evolution_data_n41;
CREATE TABLE part_change_various_various_timestamp_n6(c6 FLOAT);
insert into table part_change_various_various_timestamp_n6 SELECT float1 FROM
schema_evolution_data_n41;
alter table part_change_various_various_timestamp_n6 replace columns (c6
TIMESTAMP);
select c6 from part_change_various_various_timestamp_n6;
{code}
> Exception in double to timestamp schema evolution
> -------------------------------------------------
>
> Key: ORC-539
> URL: https://issues.apache.org/jira/browse/ORC-539
> Project: ORC
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Laszlo Bodor
> Priority: Major
>
> I backported ORC-189 to my own branch and run tests in Hive. I am getting the
> following exception in a test related to schema evolution from double to
> timestamp after applying ORC-189:
> {noformat}
> Caused by: java.io.IOException: Error reading file:
> file:/Users/jcamachorodriguez/src/workspaces/hive/itests/qtest/target/localfs/warehouse/part_change_various_various_timestamp_n6/part=1/000000_0
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1289)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:87)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:103)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:252)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:227)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
> ... 23 more
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream
> for column 7 kind DATA position: 15 length: 15 range: 0 offset: 122 limit:
> 122 range 0 = 0 to 15 uncompressed: 12 to 12
> at
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:125)
> at
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:108)
> at
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:104)
> at
> org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:783)
> at
> org.apache.orc.impl.ConvertTreeReaderFactory$TimestampFromDoubleTreeReader.nextVector(ConvertTreeReaderFactory.java:1883)
> at
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2012)
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1282)
> ... 28 more
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)