[
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781095#comment-16781095
]
Hadoop QA commented on MAPREDUCE-5018:
--------------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
11m 36s{color} | {color:green} branch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}
3m 3s{color} | {color:orange} root: The patch generated 8 new + 2 unchanged -
0 fixed = 10 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m
0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m
36s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
11m 49s{color} | {color:green} patch has no errors when building and testing
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m
9s{color} | {color:red} hadoop-common-project_hadoop-common generated 2 new + 0
unchanged - 0 fixed = 2 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m
11s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m
32s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed.
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m
38s{color} | {color:green} hadoop-streaming in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
53s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 24s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | MAPREDUCE-5018 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12644886/MAPREDUCE-5018.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient findbugs checkstyle shellcheck shelldocs |
| uname | Linux 7b8f827152b6 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0d61fac |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| shellcheck | v0.4.6 |
| findbugs | v3.1.0-RC1 |
| checkstyle |
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7588/artifact/out/diff-checkstyle-root.txt
|
| javadoc |
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7588/artifact/out/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7588/testReport/ |
| Max. process+thread count | 1608 (vs. ulimit of 10000) |
| modules | C: hadoop-common-project/hadoop-common
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
hadoop-tools/hadoop-streaming U: . |
| Console output |
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7588/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Support raw binary data with Hadoop streaming
> ---------------------------------------------
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: contrib/streaming
> Affects Versions: 1.1.2
> Reporter: Jay Hacker
> Assignee: Steven Willis
> Priority: Minor
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5018-branch-1.1.patch, MAPREDUCE-5018.patch,
> MAPREDUCE-5018.patch, justbytes.jar, mapstream
>
>
> People often have a need to run older programs over many files, and turn to
> Hadoop streaming as a reliable, performant batch system. There are good
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs,
> and so needs to interpret the data passing through it. Unfortunately, this
> makes it difficult to use Hadoop streaming with programs that don't deal in
> key/value pairs, or with binary data in general. For example, something as
> simple as running md5sum to verify the integrity of files will not give the
> correct result, due to Hadoop's interpretation of the data.
> There have been several attempts at binary serialization schemes for Hadoop
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed
> at efficiently encoding key/value pairs, and not passing data through
> unmodified. Even the "RawBytes" serialization scheme adds length fields to
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently,
> the only way I can do this on the raw data is to copy the data out and run
> the filter on one machine, which is inconvenient, slow, and unreliable. It
> would be very convenient to run the filter as a map-only job, allowing me to
> build on existing (well-tested!) building blocks in the Unix tradition
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to
> process whole files; and of course many expect raw binary input and output.
> The solution is to run a map-only job with an InputFormat and OutputFormat
> that just pass raw bytes and don't split. It turns out to be a little more
> complicated with streaming; I have attached a patch with the simplest
> solution I could come up with. I call the format "JustBytes" (as "RawBytes"
> was already taken), and it should be usable with most recent versions of
> Hadoop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]