[
https://issues.apache.org/jira/browse/IMPALA-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manish Maheshwari updated IMPALA-3289:
--------------------------------------
Description:
When disk performance is drastically degraded during query execution Impala
will not recognize this and the query will appear to "hang".
A threshold could be set for disk IO performance below which the query will be
cancelled thus advising the user there is an issue.
Some error messages -
{code:java}
E0226 07:27:41.546187 14795 tmp-file-mgr.cc:211]
0541dda3dc371844:c22b251700000000] Error for temporary file
'/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87':
Disk I/O error on gbrpsr000012838.intranet.barcapint.com:22000: open() failed
for
/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87.
Disk level I/O error occured. errno=5
W1028 21:00:05.312568 56851 DfsClientShmManager.java:365]
EndpointShmManager(DatanodeInfoWithStorage[22.50.92.142:1004,DS-4af8e8f7-c6b6-43e7-8a0a-19d445a7a32e,DISK],
parent=ShortCircuitShmManager(2301f5f2)): error shutting down shm: got
IOException calling shutdown(SHUT_RDWR)
impalad.WARNING:W0226 15:15:45.458577 25224 BlockReaderFactory.java:647]
0d43c912dd091557:ab21fb05000000f5]
BlockReaderFactory(fileName=/warehouse/datalake/AAAAAAAAA.dat,
block=BP-1018268685-35.49.40.158-1438950312819:blk_5003986013_3938223123):
unknown response code ERROR while attempting to set up short-circuit access.
RegisteredShm(62d9cfb1e2af3c6697ace97f93109c88): slot 125 is already in use..
Short-circuit read for DataNode
DatanodeInfoWithStorage[22.50.92.142:1004,DS-ce9c7134-ad13-47fc-93c0-8cec6c3f3e7e,DISK]
is disabled temporarily for 1 seconds based on
dfs.domain.socket.disable.interval.seconds.{code}
was:
When disk performance is drastically degraded during query execution Impala
will not recognize this and the query will appear to "hang".
A threshold could be set for disk IO performance below which the query will be
cancelled thus advising the user there is an issue.
> Disk performance threshold to avoid "hang"
> ------------------------------------------
>
> Key: IMPALA-3289
> URL: https://issues.apache.org/jira/browse/IMPALA-3289
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: Impala 2.3.0
> Reporter: Thomas Scott
> Priority: Minor
> Labels: resource-management
>
> When disk performance is drastically degraded during query execution Impala
> will not recognize this and the query will appear to "hang".
> A threshold could be set for disk IO performance below which the query will
> be cancelled thus advising the user there is an issue.
> Some error messages -
> {code:java}
> E0226 07:27:41.546187 14795 tmp-file-mgr.cc:211]
> 0541dda3dc371844:c22b251700000000] Error for temporary file
> '/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87':
> Disk I/O error on gbrpsr000012838.intranet.barcapint.com:22000: open()
> failed for
> /data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87.
> Disk level I/O error occured. errno=5
> W1028 21:00:05.312568 56851 DfsClientShmManager.java:365]
> EndpointShmManager(DatanodeInfoWithStorage[22.50.92.142:1004,DS-4af8e8f7-c6b6-43e7-8a0a-19d445a7a32e,DISK],
> parent=ShortCircuitShmManager(2301f5f2)): error shutting down shm: got
> IOException calling shutdown(SHUT_RDWR)
> impalad.WARNING:W0226 15:15:45.458577 25224 BlockReaderFactory.java:647]
> 0d43c912dd091557:ab21fb05000000f5]
> BlockReaderFactory(fileName=/warehouse/datalake/AAAAAAAAA.dat,
> block=BP-1018268685-35.49.40.158-1438950312819:blk_5003986013_3938223123):
> unknown response code ERROR while attempting to set up short-circuit access.
> RegisteredShm(62d9cfb1e2af3c6697ace97f93109c88): slot 125 is already in use..
> Short-circuit read for DataNode
> DatanodeInfoWithStorage[22.50.92.142:1004,DS-ce9c7134-ad13-47fc-93c0-8cec6c3f3e7e,DISK]
> is disabled temporarily for 1 seconds based on
> dfs.domain.socket.disable.interval.seconds.{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]