[ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901291#comment-16901291
 ] 

Piotr Findeisen commented on HIVE-21376:
----------------------------------------

bq. If bucketed data with those types has been written in 3.0 using v2, a user 
should recreate those bucketed tables using a more recent Hive version.

To me that means Hive 3 should not be deployed on production until this issue 
is fixed.
It's fixed in 3.1.2, but the latest available from HDP is 3.1.0.

[~jcamachorodriguez] do you have a timeline when 3.1.2 will be available in HDP?


> Incompatible change in Hive bucket computation
> ----------------------------------------------
>
>                 Key: HIVE-21376
>                 URL: https://issues.apache.org/jira/browse/HIVE-21376
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: David Phillips
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>             Fix For: 4.0.0, 3.2.0, 3.1.2
>
>         Attachments: HIVE-21376.01.patch, HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to