[ 
https://issues.apache.org/jira/browse/IMPALA-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682823#comment-17682823
 ] 

Quanlong Huang commented on IMPALA-11756:
-----------------------------------------

To explain how the above minor compaction fails, there are 3 things happened:
 * The minor compaction is performed by query-based compactor. It fails due to 
the session dir is removed.
 * The session dir is removed by the error handling logic of a background task 
that recompute stats after a previous major compaction.
 * The recompute stats task failed because the underlying table is removed (by 
the test cleanup code) just after the major compaction.

Take another failure logs as an example. The query-based minor compaction 
failed by the temp table dir not found:
{noformat}
2023-01-19T12:37:55,887 ERROR 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-51_executor] 
compactor.Worker: Caught exception while trying to compact 
id:15,dbname:partial_catalog_info_test,tableName:insert_only_partitioned,partName:part=1,state:^@,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:jenkins,tooManyAborts:false,hasOldAbort:false,highestWriteId:3,errorMessage:null,workerId:
 null,initiatorId: null,retryRetention0. Marking failed to avoid repeated 
failures
java.io.FileNotFoundException: File 
hdfs://localhost:20500/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe/_tmp_space.db/9096616d-319f-4a0c-b841-2b5e8bbb41db
 does not exist.
{noformat}
HDFS dir "/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe" is removed by 
the error handling logic of the recompute stats task 15s ago:
{noformat}
2023-01-19T12:37:40,195 ERROR 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
ql.Driver: Failed to run analyze table partial_catalog_info_test.test_full_acid 
partition(year='2010',month='10') compute statistics
org.apache.hadoop.hive.ql.processors.CommandProcessorException: null
        at ...
2023-01-19T12:37:40,197  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
tez.TezSessionPoolManager: Closing tez session if not default: 
sessionId=af680f42-e4ac-44c1-a918-939a7e93ef61, queueName=null, user=jenkins, 
doAs=false, isOpen=true, isDefault=false, expires in 587910645ms
2023-01-19T12:37:40,197  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
tez.TezSessionState: Closing Tez Session
2023-01-19T12:37:40,197  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
client.TezClient: Shutting down Tez Session, 
sessionName=HIVE-af680f42-e4ac-44c1-a918-939a7e93ef61, 
applicationId=application_1674157923248_0015
2023-01-19T12:37:40,201  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
tez.TezSessionState: Attempting to clean up scratchDir for 
af680f42-e4ac-44c1-a918-939a7e93ef61 : 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/af680f42-e4ac-44c1-a918-939a7e93ef61
2023-01-19T12:37:40,202  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
tez.TezSessionState: Attempting to clean up resources for 
af680f42-e4ac-44c1-a918-939a7e93ef61 : 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/af680f42-e4ac-44c1-a918-939a7e93ef61-resources;
 0 additional files, 1 localized resources
2023-01-19T12:37:40,202  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
tez.TezSessionPoolManager: Unregistering session from pool manager. 
sessionId=af680f42-e4ac-44c1-a918-939a7e93ef61, queueName=null, user=jenkins, 
doAs=false, isOpen=false, isDefault=false, expires in 587910640ms 
#OpenTezSessions: 3
2023-01-19T12:37:40,203  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
cleanup.SyncCleanupService: Deleted directory: 
/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe on fs with scheme hdfs
2023-01-19T12:37:40,203  INFO 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
cleanup.SyncCleanupService: Deleted directory: 
/tmp/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe on fs with scheme file
{noformat}
The recompute stats task failed by the underlying files not found:
{noformat}
2023-01-19T12:37:40,172 ERROR 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
SessionState: Status: Failed
2023-01-19T12:37:40,172 ERROR 
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor] 
SessionState: Vertex failed, vertexName=Map 1, 
vertexId=vertex_1674157923248_0015_1_00, diagnostics=[Vertex 
vertex_1674157923248_0015_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 
partial_catalog_info_test.test_full_acid initializer failed, 
vertex=vertex_1674157923248_0015_1_00 [Map 1], java.lang.RuntimeException: ORC 
split generation failed with exception: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/test_full_acid/year=2010/month=10
 does not exist.
        at ...
Caused by: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/test_full_acid/year=2010/month=10
 does not exist.
        at ...{noformat}
We don't rely on Hive to recompute the stats. We can disable this feature by 
setting "hive.compactor.gather.stats" to false to avoid this.

Uploaded a patch for review: https://gerrit.cloudera.org/c/19464/

> testPartitionFileMetadataAfterMinorCompaction failing with bad file count
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-11756
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11756
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.1.1
>            Reporter: Andrew Sherman
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> In 
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testPartitionFileMetadataAfterMinorCompaction,
>  after a minor file compassion the number of files in the table is 3 when we 
> expect to see 1. 
> {code}
> ava.lang.AssertionError: expected:<1> but was:<3>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at org.junit.Assert.assertEquals(Assert.java:631)
>       at 
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testFileMetadataAfterCompaction(PartialCatalogInfoWriteIdTest.java:605)
>       at 
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testPartitionFileMetadataAfterMinorCompaction(PartialCatalogInfoWriteIdTest.java:526)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to