hi, I'm try to write a hadoop streaming job by perl. But i'm complately confused by the key/value separator.
I found lots of separators I can set ... # -jobconf stream.map.output.field.separator=A \ # -jobconf stream.reducer.output.field.separator=B \ # -jobconf mapred.textoutputformat.separator=C \ # -jobconf key.value.separator.in.input.line=D \ # -jobconf stream.map.output.field.separator=A \ # -jobconf stream.reduce.input.field.separator=AA \ # -jobconf stream.reduce.output.field.separator=B \ # -jobconf map.output.key.field.separator=C \ But what does these separators mean? I try to use ^A in my job, and find this bug<http://issues.apache.org/jira/browse/HADOOP-3341>, it seems hadoop have fix it in 0.19.0, but I still get follow error when I set to ^A. [Fatal Error] :49:68: Character reference "" is an invalid XML character. 09/11/10 11:10:16 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: Character reference "" is an invalid XML character. Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "" is an invalid XML character. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1167) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1039) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:979) at org.apache.hadoop.conf.Configuration.get(Configuration.java:381) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1630) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:214) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:372) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:873) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:118) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.xml.sax.SAXParseException: Character reference "" is an invalid XML character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1091) ... 19 more So, I can't use ^A as the separator ?
