[
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039637#comment-16039637
]
Steve Loughran commented on HADOOP-14475:
-----------------------------------------
bq. 3.that is the issue confused me. I still don't know why the
filesystem(S3AFileSystem) be initialized multiple times in a MR job. From
AzureFileSystem and DataNodeMetric, their filesystem and MetricSystem should be
only initialized once.
every connection to a different bucket will have its own FS instance, with its
own settings; if your mapper or reducer is working with >1 bucket, you use >1
fs. This is more obvious in things like Hive and Spark where processes are
handling many requests from different people, and FS are actually stored
separately for each person as well as each bucket (have a look at
FileSystem.get()) You'd get the same with azure trying to talk to different
buckets in the same process too.
> Metrics of S3A don't print out when enable it in Hadoop metrics property file
> ------------------------------------------------------------------------------
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017
> x86_64 x86_64 x86_64 GNU/Linux
> cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
> Reporter: Yonger
> Attachments: s3a-metrics.patch1
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xxxxxxxxxx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]