[
https://issues.apache.org/jira/browse/MAPREDUCE-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994608#comment-14994608
]
Colin Patrick McCabe commented on MAPREDUCE-6538:
-------------------------------------------------
bq. \[The Java client APIs provide significant advantages that neither
streaming nor pipes provide\]... is a false statement. Partitioning, for
example, can't be done natively in streaming code but can in pipes. In
streaming, you can only provide a Java class.
I agree that supporting partitioning is an advantage of pipes that streaming
doesn't have. There are still advantages that the Java API has over both,
which is the point I was making. I also don't see a fundamental reason why
streaming couldn't be extended to provide this, which would be beneficial to
languages like Python that can't use pipes.
bq. Correct. Because if the code is being written MR in C++, why would one use
the less functional streaming API? If one believes that MR jobs consist of
nothing but reading and writing KVs I could see that, but there's a lot more
going on under the hood in more advanced jobs. That functionality is just
flat-out not available in streaming.
I would personally prefer to either use a JVM language or deal with the simple
and clean stdout/stdin paradigm of streaming, than deal with pipes.
There is a lot of technical debt in pipes. It is hardcoded to output log
messages to stderr using {{fprintf}}. Keys and values need to be serialized to
C++ {{std::string}} objects. It doesn't follow the same coding style as the
other C++ code in Hadoop. It builds at {{\-O0}} and doesn't generate a
{{.so}}, just a {{.a}}. There is no unit test suite, no concept of what the
API is or how it's allowed to change over time, and very little documentation.
[~aw], since you are committed to keeping pipes around, can you please file
follow-on JIRAs for fixing these issues and link them to this JIRA? I will
close this as WONTFIX. We can always revisit this later if things change.
> Deprecate hadoop-pipes
> ----------------------
>
> Key: MAPREDUCE-6538
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6538
> Project: Hadoop Map/Reduce
> Issue Type: Wish
> Components: pipes
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
>
> Development appears to have stopped on hadoop-pipes upstream for the last few
> years, aside from very basic maintenance. Hadoop streaming seems to be a
> better alternative, since it supports more programming languages and is
> better implemented.
> There were no responses to a message on the mailing list asking for users of
> Hadoop pipes... and in my experience, I have never seen anyone use this. We
> should remove it to reduce our maintenance burden and build times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)