[ 
https://issues.apache.org/jira/browse/HIVE-28639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28639:
--------------------------------
    Description: 
Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in 
TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal 
nature).
Motivation here: in cloud environments, the stats provided by 
FileSystem.Statistics are not suitable for in-depth debugging, we only have 
bytesRead, bytesWrite and so, and in case of throttling and retries, we don't 
have the chance to tell what led to performance degradation.

The proposal here is to utilize the IOStatistics already given by Tez to get 
stats like:
{code}
query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0 query-executor 
1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server handler 
2 on 25000"] IOStatistics: counters=((action_file_opened=3177)
(action_http_get_request=60363)
(action_http_head_request=3177)
(audit_request_execution=63540)
(audit_span_creation=3178)
(object_metadata_request=3177)
(op_open=3177)
(store_io_request=63540)
(stream_read_bytes=3781596661)
(stream_read_close_operations=3177)
(stream_read_closed=60363)
(stream_read_opened=60363)
(stream_read_operations=1107101)
(stream_read_operations_incomplete=296426)
(stream_read_remote_stream_drain=60363)
(stream_read_seek_policy_changed=3177)
(stream_read_total_bytes=3781596661));

gauges=();

minimums=((action_file_opened.min=10)
(action_http_get_request.min=17)
(action_http_head_request.min=6)
(stream_read_remote_stream_drain.min=0));

maximums=((action_file_opened.max=1200)
(action_http_get_request.max=378)
(action_http_head_request.max=1176)
(stream_read_remote_stream_drain.max=3));

means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946))
(action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320))
(action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790))
(stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084)));

query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 query-executor 
1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server handler 
2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
(action_http_head_request.failures=290)
(audit_request_execution=36068)
(audit_span_creation=17857)
(files_created=3584)
(ignored_errors=30)
(object_list_request=17871)
(object_list_request.failures=15)
(object_metadata_request=14578)
(object_put_bytes=2483204365)
(object_put_request=3619)
(object_put_request.failures=40)
(object_put_request_completed=3619)
(op_create=3584)
(op_exists=10704)
(op_mkdirs=3568)
(store_io_request=38021)
(store_io_retry=2018)
(store_io_throttled=310)
(stream_write_block_uploads=3584)
(stream_write_bytes=2460606811)
(stream_write_total_data=4914757178));

gauges=((stream_write_block_uploads_data_pending=3228222)
(stream_write_block_uploads_pending=3584));

minimums=((action_http_head_request.failures.min=5)
(action_http_head_request.min=5)
(object_list_request.failures.min=6)
(object_list_request.min=8)
(object_put_request.failures.min=125)
(object_put_request.min=111)
(op_create.min=16)
(op_exists.min=15)
(op_mkdirs.min=10));

maximums=((action_http_head_request.failures.max=3025)
(action_http_head_request.max=2760)
(object_list_request.failures.max=3005)
(object_list_request.max=5915)
(object_put_request.failures.max=60008)
(object_put_request.max=5596)
(op_create.max=17899)
(op_exists.max=57540)
(op_mkdirs.max=5703));

means=((action_http_head_request.failures.mean=(samples=290, sum=257010, 
mean=886.2414))
(action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
(object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
(object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
(object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
(object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
(op_create.mean=(samples=3584, sum=831359, mean=231.9640))
(op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
(op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
{code}


> Aggregate storage statistics in Hive LLAP
> -----------------------------------------
>
>                 Key: HIVE-28639
>                 URL: https://issues.apache.org/jira/browse/HIVE-28639
>             Project: Hive
>          Issue Type: Improvement
>      Security Level: Public(Viewable by anyone) 
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>             Fix For: 4.1.0
>
>
> Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in 
> TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal 
> nature).
> Motivation here: in cloud environments, the stats provided by 
> FileSystem.Statistics are not suitable for in-depth debugging, we only have 
> bytesRead, bytesWrite and so, and in case of throttling and retries, we don't 
> have the chance to tell what led to performance degradation.
> The proposal here is to utilize the IOStatistics already given by Tez to get 
> stats like:
> {code}
> query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0 
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server 
> handler 2 on 25000"] IOStatistics: counters=((action_file_opened=3177)
> (action_http_get_request=60363)
> (action_http_head_request=3177)
> (audit_request_execution=63540)
> (audit_span_creation=3178)
> (object_metadata_request=3177)
> (op_open=3177)
> (store_io_request=63540)
> (stream_read_bytes=3781596661)
> (stream_read_close_operations=3177)
> (stream_read_closed=60363)
> (stream_read_opened=60363)
> (stream_read_operations=1107101)
> (stream_read_operations_incomplete=296426)
> (stream_read_remote_stream_drain=60363)
> (stream_read_seek_policy_changed=3177)
> (stream_read_total_bytes=3781596661));
> gauges=();
> minimums=((action_file_opened.min=10)
> (action_http_get_request.min=17)
> (action_http_head_request.min=6)
> (stream_read_remote_stream_drain.min=0));
> maximums=((action_file_opened.max=1200)
> (action_http_get_request.max=378)
> (action_http_head_request.max=1176)
> (stream_read_remote_stream_drain.max=3));
> means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946))
> (action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320))
> (action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790))
> (stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084)));
> query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server 
> handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
> (action_http_head_request.failures=290)
> (audit_request_execution=36068)
> (audit_span_creation=17857)
> (files_created=3584)
> (ignored_errors=30)
> (object_list_request=17871)
> (object_list_request.failures=15)
> (object_metadata_request=14578)
> (object_put_bytes=2483204365)
> (object_put_request=3619)
> (object_put_request.failures=40)
> (object_put_request_completed=3619)
> (op_create=3584)
> (op_exists=10704)
> (op_mkdirs=3568)
> (store_io_request=38021)
> (store_io_retry=2018)
> (store_io_throttled=310)
> (stream_write_block_uploads=3584)
> (stream_write_bytes=2460606811)
> (stream_write_total_data=4914757178));
> gauges=((stream_write_block_uploads_data_pending=3228222)
> (stream_write_block_uploads_pending=3584));
> minimums=((action_http_head_request.failures.min=5)
> (action_http_head_request.min=5)
> (object_list_request.failures.min=6)
> (object_list_request.min=8)
> (object_put_request.failures.min=125)
> (object_put_request.min=111)
> (op_create.min=16)
> (op_exists.min=15)
> (op_mkdirs.min=10));
> maximums=((action_http_head_request.failures.max=3025)
> (action_http_head_request.max=2760)
> (object_list_request.failures.max=3005)
> (object_list_request.max=5915)
> (object_put_request.failures.max=60008)
> (object_put_request.max=5596)
> (op_create.max=17899)
> (op_exists.max=57540)
> (op_mkdirs.max=5703));
> means=((action_http_head_request.failures.mean=(samples=290, sum=257010, 
> mean=886.2414))
> (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
> (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
> (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
> (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
> (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
> (op_create.mean=(samples=3584, sum=831359, mean=231.9640))
> (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
> (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to