[
https://issues.apache.org/jira/browse/IMPALA-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682823#comment-17682823
]
Quanlong Huang commented on IMPALA-11756:
-----------------------------------------
To explain how the above minor compaction fails, there are 3 things happened:
* The minor compaction is performed by query-based compactor. It fails due to
the session dir is removed.
* The session dir is removed by the error handling logic of a background task
that recompute stats after a previous major compaction.
* The recompute stats task failed because the underlying table is removed (by
the test cleanup code) just after the major compaction.
Take another failure logs as an example. The query-based minor compaction
failed by the temp table dir not found:
{noformat}
2023-01-19T12:37:55,887 ERROR
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-51_executor]
compactor.Worker: Caught exception while trying to compact
id:15,dbname:partial_catalog_info_test,tableName:insert_only_partitioned,partName:part=1,state:^@,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:jenkins,tooManyAborts:false,hasOldAbort:false,highestWriteId:3,errorMessage:null,workerId:
null,initiatorId: null,retryRetention0. Marking failed to avoid repeated
failures
java.io.FileNotFoundException: File
hdfs://localhost:20500/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe/_tmp_space.db/9096616d-319f-4a0c-b841-2b5e8bbb41db
does not exist.
{noformat}
HDFS dir "/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe" is removed by
the error handling logic of the recompute stats task 15s ago:
{noformat}
2023-01-19T12:37:40,195 ERROR
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
ql.Driver: Failed to run analyze table partial_catalog_info_test.test_full_acid
partition(year='2010',month='10') compute statistics
org.apache.hadoop.hive.ql.processors.CommandProcessorException: null
at ...
2023-01-19T12:37:40,197 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
tez.TezSessionPoolManager: Closing tez session if not default:
sessionId=af680f42-e4ac-44c1-a918-939a7e93ef61, queueName=null, user=jenkins,
doAs=false, isOpen=true, isDefault=false, expires in 587910645ms
2023-01-19T12:37:40,197 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
tez.TezSessionState: Closing Tez Session
2023-01-19T12:37:40,197 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
client.TezClient: Shutting down Tez Session,
sessionName=HIVE-af680f42-e4ac-44c1-a918-939a7e93ef61,
applicationId=application_1674157923248_0015
2023-01-19T12:37:40,201 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
tez.TezSessionState: Attempting to clean up scratchDir for
af680f42-e4ac-44c1-a918-939a7e93ef61 :
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/af680f42-e4ac-44c1-a918-939a7e93ef61
2023-01-19T12:37:40,202 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
tez.TezSessionState: Attempting to clean up resources for
af680f42-e4ac-44c1-a918-939a7e93ef61 :
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/af680f42-e4ac-44c1-a918-939a7e93ef61-resources;
0 additional files, 1 localized resources
2023-01-19T12:37:40,202 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
tez.TezSessionPoolManager: Unregistering session from pool manager.
sessionId=af680f42-e4ac-44c1-a918-939a7e93ef61, queueName=null, user=jenkins,
doAs=false, isOpen=false, isDefault=false, expires in 587910640ms
#OpenTezSessions: 3
2023-01-19T12:37:40,203 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
cleanup.SyncCleanupService: Deleted directory:
/tmp/hive/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe on fs with scheme hdfs
2023-01-19T12:37:40,203 INFO
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
cleanup.SyncCleanupService: Deleted directory:
/tmp/jenkins/b200469b-263b-4e99-b3eb-3d4a983ebcfe on fs with scheme file
{noformat}
The recompute stats task failed by the underlying files not found:
{noformat}
2023-01-19T12:37:40,172 ERROR
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
SessionState: Status: Failed
2023-01-19T12:37:40,172 ERROR
[impala-ec2-centos79-m6i-4xlarge-ondemand-07c2.vpc.cloudera.com-48_executor]
SessionState: Vertex failed, vertexName=Map 1,
vertexId=vertex_1674157923248_0015_1_00, diagnostics=[Vertex
vertex_1674157923248_0015_1_00 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input:
partial_catalog_info_test.test_full_acid initializer failed,
vertex=vertex_1674157923248_0015_1_00 [Map 1], java.lang.RuntimeException: ORC
split generation failed with exception: java.io.FileNotFoundException: File
hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/test_full_acid/year=2010/month=10
does not exist.
at ...
Caused by: java.io.FileNotFoundException: File
hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/test_full_acid/year=2010/month=10
does not exist.
at ...{noformat}
We don't rely on Hive to recompute the stats. We can disable this feature by
setting "hive.compactor.gather.stats" to false to avoid this.
Uploaded a patch for review: https://gerrit.cloudera.org/c/19464/
> testPartitionFileMetadataAfterMinorCompaction failing with bad file count
> -------------------------------------------------------------------------
>
> Key: IMPALA-11756
> URL: https://issues.apache.org/jira/browse/IMPALA-11756
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.1.1
> Reporter: Andrew Sherman
> Assignee: Quanlong Huang
> Priority: Critical
>
> In
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testPartitionFileMetadataAfterMinorCompaction,
> after a minor file compassion the number of files in the table is 3 when we
> expect to see 1.
> {code}
> ava.lang.AssertionError: expected:<1> but was:<3>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testFileMetadataAfterCompaction(PartialCatalogInfoWriteIdTest.java:605)
> at
> org.apache.impala.catalog.PartialCatalogInfoWriteIdTest.testPartitionFileMetadataAfterMinorCompaction(PartialCatalogInfoWriteIdTest.java:526)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
> at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]