[ 
https://issues.apache.org/jira/browse/HADOOP-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992190#comment-14992190
 ] 

Colin Patrick McCabe commented on HADOOP-12547:
-----------------------------------------------

bq. Why do people use streaming instead of the Java MR API? Or even, why do 
people use Java MR instead of streaming?

Because the Java MR API only supports Java (and possibly other JVM languages), 
whereas streaming supports Perl, Python, Ruby, C, C++, and any other non-JVM 
programming language you can think of.

bq. we have people actually using it

Who specifically is using it?  [~kihwal] said he'd check if they were still 
using it, but didn't return with that information yet.  There was a post on 
stack overflow where a newbie tried to use it and failed.

bq. we haven't removed or deprecated MRv1 yet either, and these two seem fairly 
tied together given the history of why it exists

Hmm.  How are they tied together?  It seems like pipes could run against MRv2 
as well as MRv1.

In general, the comparisons of hadoop-pipes with mapreduce itself don't seem 
fair.  Users and customers use mapreduce jobs on a daily basis-- for disaster 
recovery with DistCp, to benchmark with Teragen, Teraread, Teravalidate, or the 
Pi jobs, and so on.  While there are good reasons to write new jobs in Spark, 
there are also a lot of MR jobs out there.  The same can't be said for 
hadoop-pipes, which we are still searching for an actual user for.

bq. So yeah, I'm definitely -1 at this point.

What specifically are you -1 on?  Removal, deprecation, or both?

Can you explain when you would advise one of your customers to use pipes 
instead of streaming?

If you feel that pipes is worth maintaining, can you file JIRAs to reinstate 
the documentation, fix the compiler warnings, and fix the security bugs?

Thanks.

> Deprecate hadoop-pipes
> ----------------------
>
>                 Key: HADOOP-12547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12547
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> Development appears to have stopped on hadoop-pipes upstream for the last few 
> years, aside from very basic maintenance.  Hadoop streaming seems to be a 
> better alternative, since it supports more programming languages and is 
> better implemented.
> There were no responses to a message on the mailing list asking for users of 
> Hadoop pipes... and in my experience, I have never seen anyone use this.  We 
> should remove it to reduce our maintenance burden and build times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to