[
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784463#comment-16784463
]
David Mollitor commented on MAPREDUCE-207:
------------------------------------------
Came across a situation lately where a user had the LZO compression codec
enabled in the cluster. The codec was installed across the cluster. However,
MR jobs, that did not even require the codec, were failing because the
compression codec was not installed on the client node where the jobs were
being submitted from. As part of the client's role in calculating splits, the
client loads the codec configuration and all the associated codec
implementations. This fails on external clients because they did not have the
codec installed. The user understandably did not want to have to install the
LZO codec on every client node, but it was at the cost of having to maintain
separate hdfs-site files for different client hosts.
Moving all of this work into the cluster removes this dependency from the
clients.
> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
> Key: MAPREDUCE-207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: applicationmaster, mrv2
> Reporter: Philip Zeyliger
> Assignee: Gera Shegalov
> Priority: Major
> Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch,
> MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch,
> MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could
> have a separate "job task type" that computes the input splits, therefore
> allowing that computation to happen on the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]