[
https://issues.apache.org/jira/browse/HIVE-28639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-28639:
--------------------------------
Description:
Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in
TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal
nature).
Motivation here: in cloud environments, the stats provided by
FileSystem.Statistics are not suitable for in-depth debugging, we only have
bytesRead, bytesWrite and so, and in case of throttling and retries, we don't
have the chance to tell what led to performance degradation.
The proposal here is to utilize the IOStatistics already given by Tez to get
stats like:
{code}
query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0 query-executor
1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374
class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server handler
2 on 25000"] IOStatistics: counters=((action_file_opened=3177)
(action_http_get_request=60363)
(action_http_head_request=3177)
(audit_request_execution=63540)
(audit_span_creation=3178)
(object_metadata_request=3177)
(op_open=3177)
(store_io_request=63540)
(stream_read_bytes=3781596661)
(stream_read_close_operations=3177)
(stream_read_closed=60363)
(stream_read_opened=60363)
(stream_read_operations=1107101)
(stream_read_operations_incomplete=296426)
(stream_read_remote_stream_drain=60363)
(stream_read_seek_policy_changed=3177)
(stream_read_total_bytes=3781596661));
gauges=();
minimums=((action_file_opened.min=10)
(action_http_get_request.min=17)
(action_http_head_request.min=6)
(stream_read_remote_stream_drain.min=0));
maximums=((action_file_opened.max=1200)
(action_http_get_request.max=378)
(action_http_head_request.max=1176)
(stream_read_remote_stream_drain.max=3));
means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946))
(action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320))
(action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790))
(stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084)));
query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 query-executor
1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374
class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server handler
2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
(action_http_head_request.failures=290)
(audit_request_execution=36068)
(audit_span_creation=17857)
(files_created=3584)
(ignored_errors=30)
(object_list_request=17871)
(object_list_request.failures=15)
(object_metadata_request=14578)
(object_put_bytes=2483204365)
(object_put_request=3619)
(object_put_request.failures=40)
(object_put_request_completed=3619)
(op_create=3584)
(op_exists=10704)
(op_mkdirs=3568)
(store_io_request=38021)
(store_io_retry=2018)
(store_io_throttled=310)
(stream_write_block_uploads=3584)
(stream_write_bytes=2460606811)
(stream_write_total_data=4914757178));
gauges=((stream_write_block_uploads_data_pending=3228222)
(stream_write_block_uploads_pending=3584));
minimums=((action_http_head_request.failures.min=5)
(action_http_head_request.min=5)
(object_list_request.failures.min=6)
(object_list_request.min=8)
(object_put_request.failures.min=125)
(object_put_request.min=111)
(op_create.min=16)
(op_exists.min=15)
(op_mkdirs.min=10));
maximums=((action_http_head_request.failures.max=3025)
(action_http_head_request.max=2760)
(object_list_request.failures.max=3005)
(object_list_request.max=5915)
(object_put_request.failures.max=60008)
(object_put_request.max=5596)
(op_create.max=17899)
(op_exists.max=57540)
(op_mkdirs.max=5703));
means=((action_http_head_request.failures.mean=(samples=290, sum=257010,
mean=886.2414))
(action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
(object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
(object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
(object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
(object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
(op_create.mean=(samples=3584, sum=831359, mean=231.9640))
(op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
(op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
{code}
> Aggregate storage statistics in Hive LLAP
> -----------------------------------------
>
> Key: HIVE-28639
> URL: https://issues.apache.org/jira/browse/HIVE-28639
> Project: Hive
> Issue Type: Improvement
> Security Level: Public(Viewable by anyone)
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Fix For: 4.1.0
>
>
> Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in
> TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal
> nature).
> Motivation here: in cloud environments, the stats provided by
> FileSystem.Statistics are not suitable for in-depth debugging, we only have
> bytesRead, bytesWrite and so, and in case of throttling and retries, we don't
> have the chance to tell what led to performance degradation.
> The proposal here is to utilize the IOStatistics already given by Tez to get
> stats like:
> {code}
> query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server
> handler 2 on 25000"] IOStatistics: counters=((action_file_opened=3177)
> (action_http_get_request=60363)
> (action_http_head_request=3177)
> (audit_request_execution=63540)
> (audit_span_creation=3178)
> (object_metadata_request=3177)
> (op_open=3177)
> (store_io_request=63540)
> (stream_read_bytes=3781596661)
> (stream_read_close_operations=3177)
> (stream_read_closed=60363)
> (stream_read_opened=60363)
> (stream_read_operations=1107101)
> (stream_read_operations_incomplete=296426)
> (stream_read_remote_stream_drain=60363)
> (stream_read_seek_policy_changed=3177)
> (stream_read_total_bytes=3781596661));
> gauges=();
> minimums=((action_file_opened.min=10)
> (action_http_get_request.min=17)
> (action_http_head_request.min=6)
> (stream_read_remote_stream_drain.min=0));
> maximums=((action_file_opened.max=1200)
> (action_http_get_request.max=378)
> (action_http_head_request.max=1176)
> (stream_read_remote_stream_drain.max=3));
> means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946))
> (action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320))
> (action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790))
> (stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084)));
> query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server
> handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
> (action_http_head_request.failures=290)
> (audit_request_execution=36068)
> (audit_span_creation=17857)
> (files_created=3584)
> (ignored_errors=30)
> (object_list_request=17871)
> (object_list_request.failures=15)
> (object_metadata_request=14578)
> (object_put_bytes=2483204365)
> (object_put_request=3619)
> (object_put_request.failures=40)
> (object_put_request_completed=3619)
> (op_create=3584)
> (op_exists=10704)
> (op_mkdirs=3568)
> (store_io_request=38021)
> (store_io_retry=2018)
> (store_io_throttled=310)
> (stream_write_block_uploads=3584)
> (stream_write_bytes=2460606811)
> (stream_write_total_data=4914757178));
> gauges=((stream_write_block_uploads_data_pending=3228222)
> (stream_write_block_uploads_pending=3584));
> minimums=((action_http_head_request.failures.min=5)
> (action_http_head_request.min=5)
> (object_list_request.failures.min=6)
> (object_list_request.min=8)
> (object_put_request.failures.min=125)
> (object_put_request.min=111)
> (op_create.min=16)
> (op_exists.min=15)
> (op_mkdirs.min=10));
> maximums=((action_http_head_request.failures.max=3025)
> (action_http_head_request.max=2760)
> (object_list_request.failures.max=3005)
> (object_list_request.max=5915)
> (object_put_request.failures.max=60008)
> (object_put_request.max=5596)
> (op_create.max=17899)
> (op_exists.max=57540)
> (op_mkdirs.max=5703));
> means=((action_http_head_request.failures.mean=(samples=290, sum=257010,
> mean=886.2414))
> (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
> (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
> (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
> (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
> (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
> (op_create.mean=(samples=3584, sum=831359, mean=231.9640))
> (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
> (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)