Hi all, I am trying to read the lzo files. It seems spark recognizes that the input file is compressed and got the decompressor as
14/07/03 18:11:01 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 14/07/03 18:11:01 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ee825cb06b23d3ab97cdd87e13cbbb630bd75b98] 14/07/03 18:11:01 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 14/07/03 18:11:01 INFO compress.CodecPool: Got brand-new decompressor [.lzo] But it has two issues 1. It just stuck here without doing anything waited for 15 min for a small files. 2. I used the hadoop-lzo to create the index so that spark can split the input to multiple maps but spark creates only one mapper. I am using python with reading using sc.textFile(). Spark version is of the git master. Regards, Gurvinder