[ 
https://issues.apache.org/jira/browse/HIVE-25979?focusedWorklogId=732256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732256
 ]

ASF GitHub Bot logged work on HIVE-25979:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Feb/22 12:13
            Start Date: 24/Feb/22 12:13
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #3050:
URL: https://github.com/apache/hive/pull/3050


   ### What changes were proposed in this pull request?
   Lineage entries are sorted before printing them. The comparator used for the 
sorting uses the string representation of `Partition` objects for comparing 
them.
   This patch propose to use only the partition values for Partition comparison 
instead
   
   ### Why are the changes needed?
   More than one partition objects may represents the same partition in the 
lineage info map. This is very common since each column has a different entry 
but the same partition. There are cases when more than one branches of the 
statement updates the same partition. In this case properties of the cached 
Partition objects may different:
   ```
   stats_part PARTITION(p=101).key ...]   -> Partition[(p=101)... 
transient_lastDdlTime=1645697627,...]
   stats_part PARTITION(p=101).value ...] -> Partition[(p=101)... 
transient_lastDdlTime=1645697627,...]
   stats_part PARTITION(p=101).key ...]   -> Partition[(p=101)... 
transient_lastDdlTime=1645697628,...]
   stats_part PARTITION(p=101).value ...] -> Partition[(p=101)... 
transient_lastDdlTime=1645697628,...]
   ```
   This difference changes the behavior of the comparator used for sorting 
Lineage entries. The printed entries contain the partition values only and this 
is the only value should be used when comparing partitions this case.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Lineage order is more stable when running the same statement several 
time.
   
   ### How was this patch tested?
   Run flaky-check: http://ci.hive.apache.org/job/hive-flaky-check/530/
   Parameters:
   ```
   -Dtest=TestMiniLlapLocalCliDriver -Dqfile=stats_part_multi_insert_acid.q -pl 
itests/qtest -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 732256)
    Remaining Estimate: 0h
            Time Spent: 10m

> Order of Lineage is flaky in qtest output
> -----------------------------------------
>
>                 Key: HIVE-25979
>                 URL: https://issues.apache.org/jira/browse/HIVE-25979
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running
> {code:java}
> mvn test -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests
> {code}
> The lineage output of statement:
> {code:java}
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p
> {code}
> is expected to be
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}
> but sometimes it is
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to