[
https://issues.apache.org/jira/browse/IMPALA-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manish Maheshwari updated IMPALA-3289:
--------------------------------------
Description:
When disk performance is drastically degraded during query execution Impala
will not recognize this and the query will appear to "hang". A threshold could
be set for disk IO performance below which there should not be any more
fragments allocated to the node and that node should be marked as degraded and
removed from executor list for a defined amount of time after which it could be
retried if the node has recovered.
Some error messages -
{code:java}
E0226 07:27:41.546187 14795 tmp-file-mgr.cc:211]
0541dda3dc371844:c22b251700000000] Error for temporary file
'/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87':
Disk I/O error on gbrpsr000012838.intranet.barcapint.com:22000: open() failed
for
/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87.
Disk level I/O error occured. errno=5
W1028 21:00:05.312568 56851 DfsClientShmManager.java:365]
EndpointShmManager(DatanodeInfoWithStorage[22.50.92.142:1004,DS-4af8e8f7-c6b6-43e7-8a0a-19d445a7a32e,DISK],
parent=ShortCircuitShmManager(2301f5f2)): error shutting down shm: got
IOException calling shutdown(SHUT_RDWR)
impalad.WARNING:W0226 15:15:45.458577 25224 BlockReaderFactory.java:647]
0d43c912dd091557:ab21fb05000000f5]
BlockReaderFactory(fileName=/warehouse/datalake/AAAAAAAAA.dat,
block=BP-1018268685-35.49.40.158-1438950312819:blk_5003986013_3938223123):
unknown response code ERROR while attempting to set up short-circuit access.
RegisteredShm(62d9cfb1e2af3c6697ace97f93109c88): slot 125 is already in use..
Short-circuit read for DataNode
DatanodeInfoWithStorage[22.50.92.142:1004,DS-ce9c7134-ad13-47fc-93c0-8cec6c3f3e7e,DISK]
is disabled temporarily for 1 seconds based on
dfs.domain.socket.disable.interval.seconds.{code}
Ref -
https://github.com/apache/impala/blob/7f190c4625f26cb375c0b0fa504ecb0887a70048/be/src/runtime/io/disk-io-mgr-test.cc#L556
was:
When disk performance is drastically degraded during query execution Impala
will not recognize this and the query will appear to "hang".
A threshold could be set for disk IO performance below which the query will be
cancelled thus advising the user there is an issue.
Some error messages -
{code:java}
E0226 07:27:41.546187 14795 tmp-file-mgr.cc:211]
0541dda3dc371844:c22b251700000000] Error for temporary file
'/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87':
Disk I/O error on gbrpsr000012838.intranet.barcapint.com:22000: open() failed
for
/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87.
Disk level I/O error occured. errno=5
W1028 21:00:05.312568 56851 DfsClientShmManager.java:365]
EndpointShmManager(DatanodeInfoWithStorage[22.50.92.142:1004,DS-4af8e8f7-c6b6-43e7-8a0a-19d445a7a32e,DISK],
parent=ShortCircuitShmManager(2301f5f2)): error shutting down shm: got
IOException calling shutdown(SHUT_RDWR)
impalad.WARNING:W0226 15:15:45.458577 25224 BlockReaderFactory.java:647]
0d43c912dd091557:ab21fb05000000f5]
BlockReaderFactory(fileName=/warehouse/datalake/AAAAAAAAA.dat,
block=BP-1018268685-35.49.40.158-1438950312819:blk_5003986013_3938223123):
unknown response code ERROR while attempting to set up short-circuit access.
RegisteredShm(62d9cfb1e2af3c6697ace97f93109c88): slot 125 is already in use..
Short-circuit read for DataNode
DatanodeInfoWithStorage[22.50.92.142:1004,DS-ce9c7134-ad13-47fc-93c0-8cec6c3f3e7e,DISK]
is disabled temporarily for 1 seconds based on
dfs.domain.socket.disable.interval.seconds.{code}
> Disk performance threshold to avoid "hang"
> ------------------------------------------
>
> Key: IMPALA-3289
> URL: https://issues.apache.org/jira/browse/IMPALA-3289
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: Impala 2.3.0
> Reporter: Thomas Scott
> Priority: Minor
> Labels: resource-management
>
> When disk performance is drastically degraded during query execution Impala
> will not recognize this and the query will appear to "hang". A threshold
> could be set for disk IO performance below which there should not be any more
> fragments allocated to the node and that node should be marked as degraded
> and removed from executor list for a defined amount of time after which it
> could be retried if the node has recovered.
> Some error messages -
> {code:java}
> E0226 07:27:41.546187 14795 tmp-file-mgr.cc:211]
> 0541dda3dc371844:c22b251700000000] Error for temporary file
> '/data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87':
> Disk I/O error on gbrpsr000012838.intranet.barcapint.com:22000: open()
> failed for
> /data2/impala/impalad/impala-scratch/0541dda3dc371844:c22b251700000000_6b169f50-3a00-4ce6-a19e-fe9360aaed87.
> Disk level I/O error occured. errno=5
> W1028 21:00:05.312568 56851 DfsClientShmManager.java:365]
> EndpointShmManager(DatanodeInfoWithStorage[22.50.92.142:1004,DS-4af8e8f7-c6b6-43e7-8a0a-19d445a7a32e,DISK],
> parent=ShortCircuitShmManager(2301f5f2)): error shutting down shm: got
> IOException calling shutdown(SHUT_RDWR)
> impalad.WARNING:W0226 15:15:45.458577 25224 BlockReaderFactory.java:647]
> 0d43c912dd091557:ab21fb05000000f5]
> BlockReaderFactory(fileName=/warehouse/datalake/AAAAAAAAA.dat,
> block=BP-1018268685-35.49.40.158-1438950312819:blk_5003986013_3938223123):
> unknown response code ERROR while attempting to set up short-circuit access.
> RegisteredShm(62d9cfb1e2af3c6697ace97f93109c88): slot 125 is already in use..
> Short-circuit read for DataNode
> DatanodeInfoWithStorage[22.50.92.142:1004,DS-ce9c7134-ad13-47fc-93c0-8cec6c3f3e7e,DISK]
> is disabled temporarily for 1 seconds based on
> dfs.domain.socket.disable.interval.seconds.{code}
> Ref -
> https://github.com/apache/impala/blob/7f190c4625f26cb375c0b0fa504ecb0887a70048/be/src/runtime/io/disk-io-mgr-test.cc#L556
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]