[
https://issues.apache.org/jira/browse/IMPALA-12162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944287#comment-17944287
]
Quanlong Huang commented on IMPALA-12162:
-----------------------------------------
Here is a stacktrace for this issue:
{noformat}
"Thread-78 [Update catalog for tbl_foo]" #429 prio=5 os_prio=0 cpu=48640.82ms
elapsed=234955.84s tid=0x0000000026ec2800 nid=0x187e64 in Object.wait()
[0x00007ee66473e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at java.lang.Object.wait([email protected]/Object.java:328)
at
org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1604)
- waiting to re-lock in wait() <0x00007f084bb107e8> (a
org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1562)
at org.apache.hadoop.ipc.Client.call(Client.java:1459)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:330)
at jdk.internal.reflect.GeneratedMethodAccessor93.invoke(Unknown Source)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke([email protected]/Method.java:566)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
- locked <0x00007f084bb10668> (a
org.apache.hadoop.io.retry.RetryInvocationHandler$Call)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:880)
at
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1869)
at
org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1815)
at
org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1845)
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1929)
at
org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1926)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1938)
at
org.apache.impala.service.CatalogOpExecutor.makeInsertEventData(CatalogOpExecutor.java:7289)
at
org.apache.impala.service.CatalogOpExecutor.prepareInsertEventData(CatalogOpExecutor.java:7226)
at
org.apache.impala.service.CatalogOpExecutor.createInsertEvents(CatalogOpExecutor.java:7175)
at
org.apache.impala.service.CatalogOpExecutor.updateCatalogImpl(CatalogOpExecutor.java:7049)
at
org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:6804)
at
org.apache.impala.service.JniCatalog.lambda$updateCatalog$15(JniCatalog.java:488)
at
org.apache.impala.service.JniCatalog$$Lambda$359/0x00007ee670579cb0.call(Unknown
Source)
at
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
at
org.apache.impala.service.JniCatalogOp$$Lambda$214/0x00007ee68e4d10b0.call(Unknown
Source)
at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
at
org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:487){noformat}
Some logs can be found in IMPALA-13960.
We should consider fetching the checksums in parallel. Or fetching them in BE
after writing each file and pass them together to catalogd.
> makeInsertEventData() can be slow during ACID writes
> ----------------------------------------------------
>
> Key: IMPALA-12162
> URL: https://issues.apache.org/jira/browse/IMPALA-12162
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Csaba Ringhofer
> Priority: Major
>
> Saw some INSERTs where most of the time was spent in
> https://github.com/apache/impala/blob/dc63ae514a445e3f197cab405b01a30c58015695/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L7011
> This was surprising, as I assumed that most of the time in
> updateCatalog()/createInsertEvents() is spent in HMS RPCs, but in the jstacks
> I saw it was mainly in calls to HDFS to compute checksum of files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]