[ 
https://issues.apache.org/jira/browse/PIG-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192562#comment-13192562
 ] 

Prashant Kommireddi commented on PIG-2319:
------------------------------------------

Hi Dmitriy, I tested read a snappy compressed file with PigStorage and it works 
just fine. 
{code}
grunt> set output.compression.enabled true;                                   
grunt> set output.compression.codec org.apache.hadoop.io.compress.SnappyCodec;
grunt>  A = load 'input';                                                     
grunt> rmf out;
grunt> STORE A INTO 'out'
{code}
Pig generates a snappy compressed file at location "out"

{code}
grunt> C = load 'out';                                                        
grunt> D = LIMIT C 10;                                                        
grunt> DUMP D;        
{code}
The above successfully reads snappy compressed file, as PigStorage uses the 
Hadoop TextInputFormat in this case.

However, this is not the case for temporary files created by Pig between 
multiple MR jobs because TFile Writer is used which supports only LZO and GZ. 
Do you see a workaround we could find to support Snappy in this case?
                
> Pig should support snappy as a value for pig.tmpfilecompression.codec
> ---------------------------------------------------------------------
>
>                 Key: PIG-2319
>                 URL: https://issues.apache.org/jira/browse/PIG-2319
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.1, 0.9.1
>            Reporter: Joe Crobak
>
> Utils.tmpFileCompressionCodec() hard-codes support for only "gz" and "lzo" 
> compression.  Since support for snappy was added in HADOOP-7206, it would be 
> nice to allow this codec as well.
> A future-proof solution to this problem might let the user provide a full 
> classname (like in the hadoop settings) or the short-hand, in case the 
> short-hand doesn't exist for a given codec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to