[jira] [Created] (PIG-4533) support of concatenated bz2/gz files

Tomas Hudik (JIRA) Wed, 06 May 2015 03:17:19 -0700

Tomas Hudik created PIG-4533:
--------------------------------

             Summary: support of concatenated bz2/gz files
                 Key: PIG-4533
                 URL: https://issues.apache.org/jira/browse/PIG-4533
             Project: Pig
          Issue Type: Bug
          Components: documentation
            Reporter: Tomas Hudik



Documentation (since 0.11.1 at least) says :
http://pig.apache.org/docs/r0.11.1/func.html#handling-compression
_"Note: PigStorage and TextLoader correctly read compressed files as long as 
they are NOT CONCATENATED FILES generated in this manner: ..."_

I doubt this is still true, since
1. I did a test - concatenated some files and processed them. However, all the
results were identical to ones that were produces on non-concatenated
files. Why? They should be different...
2. Jira's https://issues.apache.org/jira/i#browse/HADOOP-4012 and 
https://issues.apache.org/jira/i#browse/HADOOP-6835 says this was fixed in 
Hadoop 0.22, Hadoop 0.20 respectively. That said Hadoop (1 and 2) are 
supporting this. I suppose Pig do not make compression on its own but rather 
depends on hadoop-core (hadoo-common respectively) libraries.

If I'm right, the documentation should be fixed (delete the part about 
concatinated compression files problems)








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4533) support of concatenated bz2/gz files

Reply via email to