[ 
https://issues.apache.org/jira/browse/HIVE-26658?focusedWorklogId=819571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-819571
 ]

ASF GitHub Bot logged work on HIVE-26658:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Oct/22 09:58
            Start Date: 24/Oct/22 09:58
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request, #3698:
URL: https://github.com/apache/hive/pull/3698

   ### What changes were proposed in this pull request?
   1. Unify converters from Parquet INT64 to Hive types.
   2. Add tests reading from Parquet INT64 timestamp to various Hive numeric 
types.
   
   ### Why are the changes needed?
   Restore backward compatibility; allow mapping INT64 timestamps with 
timestamp annotation to the following Hive numeric types:
   * TINYINT
   * SMALLINT
   * INT
   * DOUBLE
   * FLOAT
   * DECIMAL
   
   For more details see HIVE-26658.
   
   ### Does this PR introduce _any_ user-facing change?
   Avoids errors/exceptions when attempting to map Parquet INT64 with timestamp 
to anything except TIMESTAMP & BIGINT.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestETypeConverter
   mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=parquet_int64_timestamp_to_numeric.q
   ```




Issue Time Tracking
-------------------

            Worklog Id:     (was: 819571)
    Remaining Estimate: 0h
            Time Spent: 10m

> INT64 Parquet timestamps cannot be mapped to most Hive numeric types
> --------------------------------------------------------------------
>
>                 Key: HIVE-26658
>                 URL: https://issues.apache.org/jira/browse/HIVE-26658
>             Project: Hive
>          Issue Type: Bug
>          Components: Parquet, Serializers/Deserializers
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Minor
>              Labels: backwards-compatibility
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When attempting to read a Parquet file with column of primitive type INT64 
> and logical type 
> [TIMESTAMP|https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/LogicalTypes.md?plain=1#L337]
>  an error is raised when the Hive type is different from TIMESTAMP and BIGINT.
> Consider a Parquet file (e.g., ts_file.parquet) with the following schema:
> {code:json}
> {
>   "name": "eventtime",
>   "type": ["null", {
>     "type": "long",
>     "logicalType": "timestamp-millis"
>   }],
>   "default": null
> }
> {code}
>  
> Mapping the column to a Hive numeric type among TINYINT, SMALLINT, INT, 
> FLOAT, DOUBLE, DECIMAL, and trying to run a SELECT will give back an error.
> The following snippet can be used to reproduce the problem.
> {code:sql}
> CREATE TABLE ts_table (eventtime INT) STORED AS PARQUET;
> LOAD DATA LOCAL INPATH 'ts_file.parquet' into table ts_table;
> SELECT * FROM ts_table;
> {code}
> This is a regression caused by HIVE-21215. Although, HIVE-21215 allows to 
> read INT64 types as Hive TIMESTAMP, which was not possible before, at the 
> same time it broke the mapping to every other Hive numeric type. The problem 
> was addressed selectively for BIGINT type very recently (HIVE-26612).
> The primary goal of this ticket is to restore backward compatibility since 
> these use-cases were working before HIVE-21215.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to