Gabor Kaszab created IMPALA-9290:
------------------------------------
Summary: ORC scanner should support schema evolution between date
and timestamp types
Key: IMPALA-9290
URL: https://issues.apache.org/jira/browse/IMPALA-9290
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Gabor Kaszab
*This is the desired use case:*
1. Create an ORC table TBL1 with a DATE column.
2. Create an ORC table TBL2 with a TIMESTAMP column that has the same location
as TBL1.
3. Insert some DATE values into TBL1 and some TIMESTAMP values into TBL2.
4. select from TBL1 returns both DATE and TIMESTAMP values (converted to DATE).
5. select from TBL2 returns both DATE and TIMESTAMPS values. The DATE values
are converted to TIMESTAMP.
Without this feature Impala return an error:
{code:java}
ERROR: Type mismatch: table column DATE is map to column timestamp in ORC file
'hdfs://localhost:20500/test-warehouse/orc_date_tbl/000000_0_copy_1'
{code}
*Note:*
With https://issues.apache.org/jira/browse/IMPALA-8801 implementing Date type
for ORC it is possible to read date values in ORC format. However, writing is
still not supported and has to be done by Hive.
*Let me copy-paste a code review comment from IMPALA-8801 as a suggestion for
the implementation:*
We can modify OrcTimestampReader to support reading orc::TimestampVectorBatch
into Date type slots. In its constructor it knows which kind of slots
(timestamp or date) it's writting to. So in ReadValue() it can have different
behaviors based on different modes (timestamp values => timestamp slots /
timestamp values => date slots). We can do the same on OrcDateColumnReader to
let it support reading ORC Date values into Timestamp type slots.
Note that the life cycle of a OrcColumnReader is within the life cycle of the
HdfsOrcScanner which only reads a split of an ORC file, and an ORC file can't
have two types for one column (e.g. column1 is timestamp in stripe1 and is date
in stripe2). So we don't need to deal with different batch types in
UpdateInputBatch().
BTW, It'd be better to add test coverage for this type compactibility check in
test_scanners.py (See TestOrc.test_type_conversions).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]