[jira] [Commented] (HADOOP-19295) S3A: fs.s3a.connection.request.timeout too low for large uploads over slow links

Steve Loughran (Jira) Tue, 01 Oct 2024 07:16:08 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886195#comment-17886195
 ]


Steve Loughran commented on HADOOP-19295:
-----------------------------------------

Hadooph 3.3.9 and v1 SDK
{code}

> time bin/hadoop fs -put -f 
> /Users/stevel/Projects/Misc/client-validator/downloads/hadoop-3.4.1-RC2/hadoop-3.4.1.t

2024-10-01 15:09:42,797 [shutdown-hook-0] INFO  statistics.IOStatisticsLogging 
(IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: 
counters=((action_http_head_request=7)
(audit_request_execution=47)
(audit_span_creation=8)
(files_copied=1)
(files_copied_bytes=973970699)
(files_created=1)
(files_deleted=2)
(multipart_upload_completed=1)
(multipart_upload_part_put=15)
(object_copy_requests=1)
(object_delete_objects=2)
(object_delete_request=2)
(object_list_request=3)
(object_metadata_request=7)
(object_multipart_initiated=2)
(object_put_bytes=973970699)
(object_put_request_completed=15)
(op_create=1)
(op_delete=1)
(op_get_file_status=3)
(op_get_file_status.failures=1)
(op_glob_status=1)
(op_rename=1)
(store_io_request=47)
(stream_write_block_uploads=30)
(stream_write_bytes=973970699)
(stream_write_queue_duration=150064)
(stream_write_total_data=1947941398)
(stream_write_total_time=868641));

gauges=();

minimums=((action_executor_acquired.min=1)
(action_http_head_request.min=35)
(object_delete_request.min=41)
(object_list_request.min=37)
(object_multipart_initiated.min=75)
(op_create.min=53)
(op_delete.min=51)
(op_get_file_status.failures.min=99)
(op_get_file_status.min=57)
(op_glob_status.min=386)
(op_rename.min=2360));

maximums=((action_executor_acquired.max=46526)
(action_http_head_request.max=365)
(object_delete_request.max=49)
(object_list_request.max=62)
(object_multipart_initiated.max=75)
(op_create.max=53)
(op_delete.max=51)
(op_get_file_status.failures.max=99)
(op_get_file_status.max=370)
(op_glob_status.max=386)
(op_rename.max=2360));

means=((action_executor_acquired.mean=(samples=15, sum=150064, mean=10004.2667))
(action_http_head_request.mean=(samples=7, sum=658, mean=94.0000))
(object_delete_request.mean=(samples=2, sum=90, mean=45.0000))
(object_list_request.mean=(samples=3, sum=155, mean=51.6667))
(object_multipart_initiated.mean=(samples=2, sum=149, mean=74.5000))
(op_create.mean=(samples=1, sum=53, mean=53.0000))
(op_delete.mean=(samples=1, sum=51, mean=51.0000))
(op_get_file_status.failures.mean=(samples=1, sum=99, mean=99.0000))
(op_get_file_status.mean=(samples=2, sum=427, mean=213.5000))
(op_glob_status.mean=(samples=1, sum=386, mean=386.0000))
(op_rename.mean=(samples=1, sum=2360, mean=2360.0000)));


________________________________________________________
Executed in  202.14 secs    fish           external
   usr time   33.49 secs   57.00 micros   33.49 secs
   sys time    4.03 secs  764.00 micros    4.03 secs
{code}


Observations:
* It is way way faster
* No timeout
I think somehow the behaviour has changed, either explicitly in the code or in 
how we configured the client.


> S3A: fs.s3a.connection.request.timeout too low for large uploads over slow 
> links
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-19295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19295
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0, 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> The value of {{fs.s3a.connection.request.timeout}} (default = 60s} is too low 
> for large uploads over slow connections.
> I suspect something changed between the v1 and v2 SDK versions so that put 
> was exempt from the normal timeouts, It is not and now surfaces in failures 
> to upload 1+ GB files over slower network connections. Smailer (for example 
> 128 MB) files work.
> The parallel queuing of writes in the S3ABlockOutputStream is helping create 
> this problem as it queues multiple blocks at the same time, so per-block 
> bandwidth becomes available/blocks ; four blocks cuts the capacity down by a 
> quarter.
> The fix is straightforward: use a much bigger timeout. I'm going to propose 
> 15 minutes. We need to strike a balance between upload time allocation and 
> other requests timing out.
> I do worry about other consequences; we've found that timeout exception happy 
> to hide the underlying causes of retry failures -so in fact this may be 
> better for all but a server hanging after the HTTP request is initiated.
> too bad we can't alter the timeout for different requests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19295) S3A: fs.s3a.connection.request.timeout too low for large uploads over slow links

Reply via email to