[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

David Mollitor (JIRA) Tue, 05 Mar 2019 06:00:16 -0800


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784463#comment-16784463
 ]


David Mollitor commented on MAPREDUCE-207:
------------------------------------------

Came across a situation lately where a user had the LZO compression codec 
enabled in the cluster.  The codec was installed across the cluster.  However, 
MR jobs, that did not even require the codec, were failing because the 
compression codec was not installed on the client node where the jobs were 
being submitted from.  As part of the client's role in calculating splits, the 
client loads the codec configuration and all the associated codec 
implementations.  This fails on external clients because they did not have the 
codec installed.  The user understandably did not want to have to install the 
LZO codec on every client node, but it was at the cost of having to maintain 
separate hdfs-site files for different client hosts.

Moving all of this work into the cluster removes this dependency from the 
clients.

> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Philip Zeyliger
>            Assignee: Gera Shegalov
>            Priority: Major
>         Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, 
> MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, 
> MAPREDUCE-207.v07.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could 
> have a separate "job task type" that computes the input splits, therefore 
> allowing that computation to happen on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster

Reply via email to