[
https://issues.apache.org/jira/browse/HADOOP-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992190#comment-14992190
]
Colin Patrick McCabe commented on HADOOP-12547:
-----------------------------------------------
bq. Why do people use streaming instead of the Java MR API? Or even, why do
people use Java MR instead of streaming?
Because the Java MR API only supports Java (and possibly other JVM languages),
whereas streaming supports Perl, Python, Ruby, C, C++, and any other non-JVM
programming language you can think of.
bq. we have people actually using it
Who specifically is using it? [~kihwal] said he'd check if they were still
using it, but didn't return with that information yet. There was a post on
stack overflow where a newbie tried to use it and failed.
bq. we haven't removed or deprecated MRv1 yet either, and these two seem fairly
tied together given the history of why it exists
Hmm. How are they tied together? It seems like pipes could run against MRv2
as well as MRv1.
In general, the comparisons of hadoop-pipes with mapreduce itself don't seem
fair. Users and customers use mapreduce jobs on a daily basis-- for disaster
recovery with DistCp, to benchmark with Teragen, Teraread, Teravalidate, or the
Pi jobs, and so on. While there are good reasons to write new jobs in Spark,
there are also a lot of MR jobs out there. The same can't be said for
hadoop-pipes, which we are still searching for an actual user for.
bq. So yeah, I'm definitely -1 at this point.
What specifically are you -1 on? Removal, deprecation, or both?
Can you explain when you would advise one of your customers to use pipes
instead of streaming?
If you feel that pipes is worth maintaining, can you file JIRAs to reinstate
the documentation, fix the compiler warnings, and fix the security bugs?
Thanks.
> Deprecate hadoop-pipes
> ----------------------
>
> Key: HADOOP-12547
> URL: https://issues.apache.org/jira/browse/HADOOP-12547
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
>
> Development appears to have stopped on hadoop-pipes upstream for the last few
> years, aside from very basic maintenance. Hadoop streaming seems to be a
> better alternative, since it supports more programming languages and is
> better implemented.
> There were no responses to a message on the mailing list asking for users of
> Hadoop pipes... and in my experience, I have never seen anyone use this. We
> should remove it to reduce our maintenance burden and build times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)