[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

Steve Loughran (JIRA) Wed, 21 Jun 2017 02:58:15 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057270#comment-16057270
 ]


Steve Loughran commented on HADOOP-14475:
-----------------------------------------

bq. the name change of context just for distinguish with other attributes, such 
as MetricsRegistry and Metrics name. From the following log, it shows using 
different names is better than ones with the same name:
{code}
17/06/05 20:32:54 DEBUG impl.MetricsSinkAdapter: Pushing record 
S3AFileSystemMetrics.s3a.s3afilesystem to file
{code}

we should be ok with staying with "S3AFileSystemMetrics" for now

bq. 2.after i make a collection the relationship of those classes, i also think 
the functions of class S3AFileSystemMetricsSystem can be merge into some 
existed class, maybe S3AFileSystem.


{{S3AFileSystem}} is *way to big* right now; we've been pulling everything out 
into its own isolated classes wherever possible. It's a losing battle (look at 
the HADOOP-13345) branch, but we try. Generally we're doing this with 
package-private classes which take {{S3AFileSystem owner}} as a constructor. 


Regarding instances

* Calls to {{FileSystem.get(URI, conf)}} or {{Path.getFilesystem(conf)}} will 
return the shared FS for that user.
* Unless the relevant system property to create unique instances for every call 
has been set.
* We like to share FS instances to allow for sharing of thread pools (s3, 
azure) and IPC channels (HDFS), so the unique stuff is generally left for whan 
you are changing the Configuration settings and really want new instances.
* Ideally an MR/Hive/spark job should have one instance per user per JVM
* And the MR job can call FileSystem.getStatistics() on the FS after the run to 
get the statistics for every FS in the JVM, to get statistics we can then 
aggregate across the entire job.

What this means is that MR jobs *should* have one S3AFS instance per VM (single 
User app and all), but services such as Hive LLAP will have many instances, 
created when queries come in, released afterwards.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-14475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14475
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>         Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>            Reporter: Yonger
>            Assignee: Yonger
>         Attachments: s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xxxxxxxxxx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

Reply via email to