[ https://issues.apache.org/jira/browse/PIG-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192562#comment-13192562 ]
Prashant Kommireddi commented on PIG-2319: ------------------------------------------ Hi Dmitriy, I tested read a snappy compressed file with PigStorage and it works just fine. {code} grunt> set output.compression.enabled true; grunt> set output.compression.codec org.apache.hadoop.io.compress.SnappyCodec; grunt> A = load 'input'; grunt> rmf out; grunt> STORE A INTO 'out' {code} Pig generates a snappy compressed file at location "out" {code} grunt> C = load 'out'; grunt> D = LIMIT C 10; grunt> DUMP D; {code} The above successfully reads snappy compressed file, as PigStorage uses the Hadoop TextInputFormat in this case. However, this is not the case for temporary files created by Pig between multiple MR jobs because TFile Writer is used which supports only LZO and GZ. Do you see a workaround we could find to support Snappy in this case? > Pig should support snappy as a value for pig.tmpfilecompression.codec > --------------------------------------------------------------------- > > Key: PIG-2319 > URL: https://issues.apache.org/jira/browse/PIG-2319 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.1, 0.9.1 > Reporter: Joe Crobak > > Utils.tmpFileCompressionCodec() hard-codes support for only "gz" and "lzo" > compression. Since support for snappy was added in HADOOP-7206, it would be > nice to allow this codec as well. > A future-proof solution to this problem might let the user provide a full > classname (like in the hadoop settings) or the short-hand, in case the > short-hand doesn't exist for a given codec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira