[ 
https://issues.apache.org/jira/browse/HIVE-26871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26871:
----------------------------------
    Description: 
The 3 tests in TestCrudCompactorOnTez which use the ProtoLoggingHook run at 
different times. Unfortunately, the 3 tests are run at the following times as 
described in the logs - 
Test 1 - 
{code:java}
INFO [main] compactor.TestCrudCompactorOnTez: Current time: 2022-12-15T23:57:44 
{code}
Test 2 - 
{code:java}
INFO [main] compactor.TestCrudCompactorOnTez: Current time: 2022-12-16T00:00:32 
{code}
Test 3 - 
{code:java}
INFO [main] compactor.TestCrudCompactorOnTez: Current time: 2022-12-16T00:04:12 
{code}
As we can see, the tests are run on 2 different dates. Therefore, 
HiveProtoLoggingHook generates a unique event logs for every unique date. This 
is the behaviour of HiveProtoLoggingHook.
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L296-L310]

However the expectation from the test side, while generating the log readers is 
that there must be a single file in the log folder defined.
[https://github.com/apache/hive/blob/master/ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java#L310]
 

Unfortunately, since there are 2 files which are generated (as mentioned in the 
logs as well), the following tests fail - 
{code:java}
INFO [main] hooks.TestHiveProtoLoggingHook: List of paths: 
INFO [main] hooks.TestHiveProtoLoggingHook: 
file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-15
INFO [main] hooks.TestHiveProtoLoggingHook: 
file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-16
 {code}
The solution is to make _getTestReader()_ in _TestHiveProtoLoggingHook_ more 
compatible with multiple event log file scenario and be able to generate 
multiple readers for all files present in the folder instead of fixating on a 
single file clause.

  was:
The 3 tests in TestCrudCompactorOnTez which use the ProtoLoggingHook run at 
different times. Unfortunately, the 3 tests are run at the following times as 
described in the logs - 
Test 1 - 
{code:java}
2022-12-15T16:57:44,294  INFO [main] compactor.TestCrudCompactorOnTez: Current 
time: 2022-12-15T23:57:44 {code}
Test 2 - 
{code:java}
2022-12-15T17:00:32,452  INFO [main] compactor.TestCrudCompactorOnTez: Current 
time: 2022-12-16T00:00:32 {code}
Test 3 - 
{code:java}
2022-12-15T17:04:12,895  INFO [main] compactor.TestCrudCompactorOnTez: Current 
time: 2022-12-16T00:04:12 {code}
As we can see, the tests are run on 2 different dates. Therefore, 
HiveProtoLoggingHook generates a unique event logs for every unique date. This 
is the behaviour of HiveProtoLoggingHook.
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L296-L310]

However the expectation from the test side, while generating the log readers is 
that there must be a single file in the log folder defined.
[https://github.com/apache/hive/blob/master/ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java#L310]
 

Unfortunately, since there are 2 files which are generated (as mentioned in the 
logs as well), the following tests fail - 
{code:java}
2022-12-15T17:04:14,837  INFO [main] hooks.TestHiveProtoLoggingHook: List of 
paths: 
2022-12-15T17:04:14,837  INFO [main] hooks.TestHiveProtoLoggingHook: 
file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-15
2022-12-15T17:04:14,837  INFO [main] hooks.TestHiveProtoLoggingHook: 
file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-16
 {code}
The solution is to make _getTestReader()_ in _TestHiveProtoLoggingHook_ more 
compatible with multiple event log file scenario and be able to generate 
multiple readers for all files present in the folder instead of fixating on a 
single file clause.


> TestCrudCompactorOnTez is flaky after HIVE-26479
> ------------------------------------------------
>
>                 Key: HIVE-26871
>                 URL: https://issues.apache.org/jira/browse/HIVE-26871
>             Project: Hive
>          Issue Type: Test
>          Components: Test
>            Reporter: Sourabh Badhya
>            Assignee: Sourabh Badhya
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The 3 tests in TestCrudCompactorOnTez which use the ProtoLoggingHook run at 
> different times. Unfortunately, the 3 tests are run at the following times as 
> described in the logs - 
> Test 1 - 
> {code:java}
> INFO [main] compactor.TestCrudCompactorOnTez: Current time: 
> 2022-12-15T23:57:44 {code}
> Test 2 - 
> {code:java}
> INFO [main] compactor.TestCrudCompactorOnTez: Current time: 
> 2022-12-16T00:00:32 {code}
> Test 3 - 
> {code:java}
> INFO [main] compactor.TestCrudCompactorOnTez: Current time: 
> 2022-12-16T00:04:12 {code}
> As we can see, the tests are run on 2 different dates. Therefore, 
> HiveProtoLoggingHook generates a unique event logs for every unique date. 
> This is the behaviour of HiveProtoLoggingHook.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L296-L310]
> However the expectation from the test side, while generating the log readers 
> is that there must be a single file in the log folder defined.
> [https://github.com/apache/hive/blob/master/ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java#L310]
>  
> Unfortunately, since there are 2 files which are generated (as mentioned in 
> the logs as well), the following tests fail - 
> {code:java}
> INFO [main] hooks.TestHiveProtoLoggingHook: List of paths: 
> INFO [main] hooks.TestHiveProtoLoggingHook: 
> file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-15
> INFO [main] hooks.TestHiveProtoLoggingHook: 
> file:/home/jenkins/agent/workspace/internal-hive-flaky-check/itests/hive-unit/target/tmp/junit441259831997042392/junit3438435196942546140/date=2022-12-16
>  {code}
> The solution is to make _getTestReader()_ in _TestHiveProtoLoggingHook_ more 
> compatible with multiple event log file scenario and be able to generate 
> multiple readers for all files present in the folder instead of fixating on a 
> single file clause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to