[ 
https://issues.apache.org/jira/browse/TEZ-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4590:
------------------------------
    Description: 
when fs.iostatistics.logging.level=info, on FileSystem closure there is a stats 
printed like this:
{code}
query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 query-executor 
1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server handler 
2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
(action_http_head_request.failures=290)
(audit_request_execution=36068)
(audit_span_creation=17857)
(files_created=3584)
(ignored_errors=30)
(object_list_request=17871)
(object_list_request.failures=15)
(object_metadata_request=14578)
(object_put_bytes=2483204365)
(object_put_request=3619)
(object_put_request.failures=40)
(object_put_request_completed=3619)
(op_create=3584)
(op_exists=10704)
(op_mkdirs=3568)
(store_io_request=38021)
(store_io_retry=2018)
(store_io_throttled=310)
(stream_write_block_uploads=3584)
(stream_write_bytes=2460606811)
(stream_write_total_data=4914757178));

gauges=((stream_write_block_uploads_data_pending=3228222)
(stream_write_block_uploads_pending=3584));

minimums=((action_http_head_request.failures.min=5)
(action_http_head_request.min=5)
(object_list_request.failures.min=6)
(object_list_request.min=8)
(object_put_request.failures.min=125)
(object_put_request.min=111)
(op_create.min=16)
(op_exists.min=15)
(op_mkdirs.min=10));

maximums=((action_http_head_request.failures.max=3025)
(action_http_head_request.max=2760)
(object_list_request.failures.max=3005)
(object_list_request.max=5915)
(object_put_request.failures.max=60008)
(object_put_request.max=5596)
(op_create.max=17899)
(op_exists.max=57540)
(op_mkdirs.max=5703));

means=((action_http_head_request.failures.mean=(samples=290, sum=257010, 
mean=886.2414))
(action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
(object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
(object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
(object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
(object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
(op_create.mean=(samples=3584, sum=831359, mean=231.9640))
(op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
(op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
{code}

this above was an example where some S3 throttling kicked in:
{code}
(action_http_head_request.failures=290)
(object_list_request.failures=15)
...
(store_io_retry=2018)
(store_io_throttled=310)
{code}

we need to find a way to aggregate these and pull them into the tez counters
note: there is a chance that a corresponding, separate ticket will be needed 
for Hive LLAP (if the aggregation is separated)

> Aggregate some IO statistics
> ----------------------------
>
>                 Key: TEZ-4590
>                 URL: https://issues.apache.org/jira/browse/TEZ-4590
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> when fs.iostatistics.logging.level=info, on FileSystem closure there is a 
> stats printed like this:
> {code}
> query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server 
> handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
> (action_http_head_request.failures=290)
> (audit_request_execution=36068)
> (audit_span_creation=17857)
> (files_created=3584)
> (ignored_errors=30)
> (object_list_request=17871)
> (object_list_request.failures=15)
> (object_metadata_request=14578)
> (object_put_bytes=2483204365)
> (object_put_request=3619)
> (object_put_request.failures=40)
> (object_put_request_completed=3619)
> (op_create=3584)
> (op_exists=10704)
> (op_mkdirs=3568)
> (store_io_request=38021)
> (store_io_retry=2018)
> (store_io_throttled=310)
> (stream_write_block_uploads=3584)
> (stream_write_bytes=2460606811)
> (stream_write_total_data=4914757178));
> gauges=((stream_write_block_uploads_data_pending=3228222)
> (stream_write_block_uploads_pending=3584));
> minimums=((action_http_head_request.failures.min=5)
> (action_http_head_request.min=5)
> (object_list_request.failures.min=6)
> (object_list_request.min=8)
> (object_put_request.failures.min=125)
> (object_put_request.min=111)
> (op_create.min=16)
> (op_exists.min=15)
> (op_mkdirs.min=10));
> maximums=((action_http_head_request.failures.max=3025)
> (action_http_head_request.max=2760)
> (object_list_request.failures.max=3005)
> (object_list_request.max=5915)
> (object_put_request.failures.max=60008)
> (object_put_request.max=5596)
> (op_create.max=17899)
> (op_exists.max=57540)
> (op_mkdirs.max=5703));
> means=((action_http_head_request.failures.mean=(samples=290, sum=257010, 
> mean=886.2414))
> (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
> (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
> (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
> (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
> (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
> (op_create.mean=(samples=3584, sum=831359, mean=231.9640))
> (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
> (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
> {code}
> this above was an example where some S3 throttling kicked in:
> {code}
> (action_http_head_request.failures=290)
> (object_list_request.failures=15)
> ...
> (store_io_retry=2018)
> (store_io_throttled=310)
> {code}
> we need to find a way to aggregate these and pull them into the tez counters
> note: there is a chance that a corresponding, separate ticket will be needed 
> for Hive LLAP (if the aggregation is separated)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to