[
https://issues.apache.org/jira/browse/TEZ-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated TEZ-4590:
------------------------------
Parent: TEZ-4510
Issue Type: Sub-task (was: Improvement)
> Aggregate some IO statistics
> ----------------------------
>
> Key: TEZ-4590
> URL: https://issues.apache.org/jira/browse/TEZ-4590
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> when fs.iostatistics.logging.level=info, on FileSystem closure there is a
> stats printed like this:
> {code}
> query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server
> handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
> (action_http_head_request.failures=290)
> (audit_request_execution=36068)
> (audit_span_creation=17857)
> (files_created=3584)
> (ignored_errors=30)
> (object_list_request=17871)
> (object_list_request.failures=15)
> (object_metadata_request=14578)
> (object_put_bytes=2483204365)
> (object_put_request=3619)
> (object_put_request.failures=40)
> (object_put_request_completed=3619)
> (op_create=3584)
> (op_exists=10704)
> (op_mkdirs=3568)
> (store_io_request=38021)
> (store_io_retry=2018)
> (store_io_throttled=310)
> (stream_write_block_uploads=3584)
> (stream_write_bytes=2460606811)
> (stream_write_total_data=4914757178));
> gauges=((stream_write_block_uploads_data_pending=3228222)
> (stream_write_block_uploads_pending=3584));
> minimums=((action_http_head_request.failures.min=5)
> (action_http_head_request.min=5)
> (object_list_request.failures.min=6)
> (object_list_request.min=8)
> (object_put_request.failures.min=125)
> (object_put_request.min=111)
> (op_create.min=16)
> (op_exists.min=15)
> (op_mkdirs.min=10));
> maximums=((action_http_head_request.failures.max=3025)
> (action_http_head_request.max=2760)
> (object_list_request.failures.max=3005)
> (object_list_request.max=5915)
> (object_put_request.failures.max=60008)
> (object_put_request.max=5596)
> (op_create.max=17899)
> (op_exists.max=57540)
> (op_mkdirs.max=5703));
> means=((action_http_head_request.failures.mean=(samples=290, sum=257010,
> mean=886.2414))
> (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
> (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
> (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
> (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
> (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
> (op_create.mean=(samples=3584, sum=831359, mean=231.9640))
> (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
> (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
> {code}
> this above was an example where some S3 throttling kicked in:
> {code}
> (action_http_head_request.failures=290)
> (object_list_request.failures=15)
> ...
> (store_io_retry=2018)
> (store_io_throttled=310)
> {code}
> we need to find a way to aggregate these and pull them into the tez counters
> note: there is a chance that a corresponding, separate ticket will be needed
> for Hive LLAP (if the aggregation is separated)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)