[
https://issues.apache.org/jira/browse/HADOOP-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988229#comment-14988229
]
Colin Patrick McCabe commented on HADOOP-12547:
-----------------------------------------------
Thank you for the perspective, [~aw]. It's true that you have been around for
longer than me. However, it's also true that in about 4 years of supporting
customer Hadoop deployments I have never, once, seen anyone use or ask about
Hadoop Pipes. We've gotten requests for some pretty obscure things-- like
adding a feature or fixing a bug in fuse_dfs, supporting the old obsolete MR1
framework, or even preparing native code patches for decades-old versions of
AIX, even running Hadoop on JVMs that I'm convinced most people have never
heard of. But __never__ for pipes.
That stack overflow post looks like a newbie stumbling into Hadoop for the
first time and trying to follow a tutorial from more than 5 years ago... and
failing, because this stuff hasn't been maintained-- and won't be maintained in
the future. That's hardly a ringing endorsement of keeping this around.
Anyway, nobody is proposing removing this from 2.6 or any branch-2 release...
only from trunk.
bq. Pipes was written primarily for Yahoo!'s search team. It was provided as a
way for C code to interface with MapReduce without requiring significant
rewrites. It was definitely in use before I left Yahoo! but I haven't kept
track of whether it is still being used. My guess is no, given most of that
team has left/was shipped over to Microsoft.
[~daryn], [~kihwal], do you have any perspective on this? Is there any reason
to keep this around in trunk / branch-3.0? If we are going to keep this, I
would like to see some unit tests, documentation, and actual maintenance.
> Remove hadoop-pipes
> -------------------
>
> Key: HADOOP-12547
> URL: https://issues.apache.org/jira/browse/HADOOP-12547
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
>
> Development appears to have stopped on hadoop-pipes upstream for the last few
> years, aside from very basic maintenance. Hadoop streaming seems to be a
> better alternative, since it supports more programming languages and is
> better implemented.
> There were no responses to a message on the mailing list asking for users of
> Hadoop pipes... and in my experience, I have never seen anyone use this. We
> should remove it to reduce our maintenance burden and build times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)