This depends on the container you're using. SequenceFiles with Snappy can be detected easily since the header of such files carry the codec class used, and hence readers instantiate the right one to decompress with.
However, since Snappy is just a compression codec and does not provide a container format (http://code.google.com/p/snappy/issues/detail?id=34) there's no present way to "detect" if a file/stream is snappy encoded or not, unless a full stream is available (to test with, via python's snappy.isValidCompressed, say). If you're using Snappy today, its best to be used at map intermediate level, and within other container formats such as the hadoop sequencefiles and avro datafiles. On Sun, Apr 15, 2012 at 6:02 PM, JAX <jayunit...@gmail.com> wrote: > Hi guys : related to the last snappy question - how does Hadoop detect Snappy > compression in the input dataset ( how does Hadoop > Know when to decompress records via snappy ). > > Jay Vyas > MMSB > UCHC -- Harsh J