StreamXMLRecordReader does not support gzipped files
----------------------------------------------------
Key: HADOOP-3562
URL: https://issues.apache.org/jira/browse/HADOOP-3562
Project: Hadoop Core
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.17.0
Reporter: Bo Adler
I am using Hadoop Streaming to analyze Wikipedia data files, which are in XML
format and are compressed because they are so large. While doing some
preliminary tests, I discovered that you cannot use StreamXMLRecordReader with
gzipped data files -- the data is fed into the mapper script as raw data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.