Hi Jörn,  
do you have a link where we can read up on it?  Or an idea how we can test this 
quickly? Bz2 has the advantage of streamextracting with bzcat.  

How exactly can working with gzip be faster than uncompressed files? 
Sebastian 


Am 16. September 2016 13:33:22 MESZ, schrieb "Jörn Hees" <j_h...@cs.uni-kl.de>:
>Hi,
>
>as i mentioned at the DBpedia meetup yesterday, i'd like to discuss the
>motivation to use bz2 as compression algorithm for the dump files.
>
>bz2 might have the advantage that it's well known, but apart from that
>it's outdated.
>Other compression algorithms (for example xz) compress and decompress
>faster and create smaller file sizes.
>So if the main concern is file-size and bandwidth for the dump files,
>then xz might be a better choice.
>
>If bandwidth is not so much of a concern, i'd love to see the dumps
>being provided as gz files.
>The reason for this is that gzip is also well known, but
>stream-processing wise much closer to the sweet spot of using a bit of
>CPU to make total IO throughput a lot faster.
>With typical hardware, working on gzipped files is actually faster than
>working with uncompressed ones.
>
>Cheers,
>Jörn
>
>
>------------------------------------------------------------------------------
>_______________________________________________
>DBpedia-developers mailing list
>DBpedia-developers@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
------------------------------------------------------------------------------
_______________________________________________
DBpedia-developers mailing list
DBpedia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to