[ 
https://issues.apache.org/jira/browse/HADOOP-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991044#comment-14991044
 ] 

Colin Patrick McCabe commented on HADOOP-12547:
-----------------------------------------------

bq. [~kihwal] wrote: I think we have internal users who depend on pipes. We 
will find out whether they can move on to streaming. If they can't because of 
any fundamental shortcomings of streaming, we will need to address those.

Thanks for speaking up, [~kihwal].  Part of the reason I filed this jira is to 
find out whether this was being used in the real world.  I do think streaming 
might be better, and am curious if anyone has a good reason in the long term to 
keep using pipes.  Please do speak up if you feel strongly about keeping this.

bq. [~cnauroth] wrote: A more telling problem is the lack of tests. Maybe I'm 
mistaken, but has the documentation vanished too? These are gaps that don't 
speak well to the long-term viability of the component. If we cannot come to 
consensus on removal, then we need to commit to filling those gaps.

I agree. If we keep pipes, we should add some kind of tests and documentation.

bq. [~cnauroth] wrote: As a matter of process, I disagree with adding 
libwebhdfs as a rider to this proposal. I don't think the two are in a 
comparable state.

Yes, I think we should consider each one on its own merits.

bq. However, I do agree that libwebhdfs is a much more viable candidate for 
removal. We have evidence that Pipes was at least used by someone at some time, 
worked correctly, and satsified its design goals. I don't believe we have any 
evidence that anyone has ever used libwebhdfs, it still doesn't build properly 
in recent releases, and it does not satisfy its design goal of providing a 
library with no JVM dependency. (This can be viewed as just a bug, but there is 
also not overwhelming support for bothering to fix it.)

That's not a fundamental design issue, just a simple bug.  I'm sure that I 
could fix that bug in an hour or two if there is support for doing so.

The reality is that there is no other mainline component that can fill the role 
that libwebhdfs fills.  There are a lot of native clients that might eventually 
be able to do the job if they were merged back to mainline, but so far none of 
them have been.  Like I said earlier, I will drop my objections the moment one 
of the native clients is merged.

I think we should stay focused on hadoop-pipes here.  I'm curious what the 
remaining cases are where hadoop-pipes is a better option than streaming.  If 
there are none, then that strongly suggests we should deprecate it, if not 
remove it.  If we do choose to keep it then I agree with the discussion here 
that it should be built as part of the build, and have at least one unit test.

> Deprecate hadoop-pipes
> ----------------------
>
>                 Key: HADOOP-12547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12547
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> Development appears to have stopped on hadoop-pipes upstream for the last few 
> years, aside from very basic maintenance.  Hadoop streaming seems to be a 
> better alternative, since it supports more programming languages and is 
> better implemented.
> There were no responses to a message on the mailing list asking for users of 
> Hadoop pipes... and in my experience, I have never seen anyone use this.  We 
> should remove it to reduce our maintenance burden and build times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to