[
https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125524#comment-16125524
]
Steve Loughran commented on HADOOP-14770:
-----------------------------------------
# add the Hadoop version to the JIRA, thanks
# What is the file format? simple or columnar (ORC, Parquet)
# Looks like the connection is being closed on every seek, which is a sign of
HADOOP-13203 not engaging (random IO), or on a sequential read, forward reads
aborting/reopening rather than skipping forward.
Make sure you are using the Hadoop 2.8.x JARS, then:
For columnar data: enabling random IO.
{code}
spark.hadoop.fs.s3a.experimental.fadvise=random
{code}
For sequential data with big forward skips
{code}
spark.hadoop.fs.s3a.readahead.range = 768K
{code}
If this fixes it, close as a duplicate of HADOOP-13203
If this doesn't fix it, you can print both the input stream and s3a FS, as
their toString() ops print all their stats.
Oh, one more possible cause: split calculation isn't getting it write. Look at
your s3a block size, and the format itself.
> S3A http connection in s3a driver not reuse in Spark application
> ----------------------------------------------------------------
>
> Key: HADOOP-14770
> URL: https://issues.apache.org/jira/browse/HADOOP-14770
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Yonger
> Assignee: Yonger
>
> I print out connection stats every 2 s when running Spark application against
> s3-compatible storage:
> ESTAB 0 0 ::ffff:10.0.2.36:44446
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44454
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44374
> ::ffff:10.0.2.254:80
> ESTAB 159724 0 ::ffff:10.0.2.36:44436
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44448
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44338
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44438
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44414
> ::ffff:10.0.2.254:80
> ESTAB 0 480 ::ffff:10.0.2.36:44450
> ::ffff:10.0.2.254:80 timer:(on,170ms,0)
> ESTAB 0 0 ::ffff:10.0.2.36:44442
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44390
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44326
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44452
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44394
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44444
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44456
> ::ffff:10.0.2.254:80
> ======================
> ESTAB 0 0 ::ffff:10.0.2.36:44508
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44476
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44524
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44374
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44500
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44504
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44512
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44506
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44464
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44518
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44510
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44442
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44526
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44472
> ::ffff:10.0.2.254:80
> ESTAB 0 0 ::ffff:10.0.2.36:44466
> ::ffff:10.0.2.254:80
> the connection in the above of "=" and below were changed all the time. But
> this haven't seen in MR application.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]