reading compress lzo files

Gurvinder Singh Thu, 03 Jul 2014 09:25:21 -0700

Hi all,

I am trying to read the lzo files. It seems spark recognizes that the
input file is compressed and got the decompressor as


14/07/03 18:11:01 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/07/03 18:11:01 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev
ee825cb06b23d3ab97cdd87e13cbbb630bd75b98]
14/07/03 18:11:01 INFO Configuration.deprecation: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
14/07/03 18:11:01 INFO compress.CodecPool: Got brand-new decompressor
[.lzo]

But it has two issues

1. It just stuck here without doing anything waited for 15 min for a
small files.
2. I used the hadoop-lzo to create the index so that spark can split
the input to multiple maps but spark creates only one mapper.

I am using python with reading using sc.textFile(). Spark version is
of the git master.

Regards,
Gurvinder

reading compress lzo files

Reply via email to