[
https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637767#comment-14637767
]
Robert Kanter commented on MAPREDUCE-6415:
------------------------------------------
{quote}Maybe I'm missing it, but why is this being written in bash instead of
as an actual yarn application? The JVM startup costs are going to be
massive.{quote}
The 'hadoop archive' command starts up a JVM. I don't see how we can get
around that unless we call it programmatically from an existing JVM and also do
it serially, which is going to take a lot longer overall.
I figured it would be simpler to use the DistributedShell because it already
exists and does most of what we need, than to write a whole new AM that creates
containers to run 'hadoop archive'.
{quote}Also, is there something that is guaranteeing that HADOOP_HOME is
set?{quote}
The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be
defined for the NM, so it ends up in env of the shell.
I wasn't aware of shellcheck before, but that looks like a really useful tool.
I'll fix those.
> Create a tool to combine aggregated logs into HAR files
> -------------------------------------------------------
>
> Key: MAPREDUCE-6415
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.8.0
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: HAR-ableAggregatedLogs_v1.pdf,
> MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch
>
>
> While we wait for YARN-2942 to become viable, it would still be great to
> improve the aggregated logs problem. We can write a tool that combines
> aggregated log files into a single HAR file per application, which should
> solve the too many files and too many blocks problems. See the design
> document for details.
> See YARN-2942 for more context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)