[
https://issues.apache.org/jira/browse/HIVE-25979?focusedWorklogId=732256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732256
]
ASF GitHub Bot logged work on HIVE-25979:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 24/Feb/22 12:13
Start Date: 24/Feb/22 12:13
Worklog Time Spent: 10m
Work Description: kasakrisz opened a new pull request #3050:
URL: https://github.com/apache/hive/pull/3050
### What changes were proposed in this pull request?
Lineage entries are sorted before printing them. The comparator used for the
sorting uses the string representation of `Partition` objects for comparing
them.
This patch propose to use only the partition values for Partition comparison
instead
### Why are the changes needed?
More than one partition objects may represents the same partition in the
lineage info map. This is very common since each column has a different entry
but the same partition. There are cases when more than one branches of the
statement updates the same partition. In this case properties of the cached
Partition objects may different:
```
stats_part PARTITION(p=101).key ...] -> Partition[(p=101)...
transient_lastDdlTime=1645697627,...]
stats_part PARTITION(p=101).value ...] -> Partition[(p=101)...
transient_lastDdlTime=1645697627,...]
stats_part PARTITION(p=101).key ...] -> Partition[(p=101)...
transient_lastDdlTime=1645697628,...]
stats_part PARTITION(p=101).value ...] -> Partition[(p=101)...
transient_lastDdlTime=1645697628,...]
```
This difference changes the behavior of the comparator used for sorting
Lineage entries. The printed entries contain the partition values only and this
is the only value should be used when comparing partitions this case.
### Does this PR introduce _any_ user-facing change?
Yes. Lineage order is more stable when running the same statement several
time.
### How was this patch tested?
Run flaky-check: http://ci.hive.apache.org/job/hive-flaky-check/530/
Parameters:
```
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=stats_part_multi_insert_acid.q -pl
itests/qtest -Pitests
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 732256)
Remaining Estimate: 0h
Time Spent: 10m
> Order of Lineage is flaky in qtest output
> -----------------------------------------
>
> Key: HIVE-25979
> URL: https://issues.apache.org/jira/browse/HIVE-25979
> Project: Hive
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When running
> {code:java}
> mvn test -Dtest=TestMiniLlapLocalCliDriver
> -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests
> {code}
> The lineage output of statement:
> {code:java}
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p
> {code}
> is expected to be
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}
> but sometimes it is
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)