If you were happy to just compress/uncompress each text file individually 
(i.e. not using tar) you could use Gzip.jl:
https://github.com/kmsquire/GZip.jl

libtar provides a C interface for some tar functionality, however its worth 
keeping in mind that tar files are not indexed, and so have very poor 
random access performance (extracting an individual file requires 
sequentially searching through the whole tarball). If you want to use tar, 
you might be better off just untar-ing the whole lot before running your 
julia job, then tar-ing it all back up again after, in which case you might 
as well just use the shell commands.


On Tuesday, 10 June 2014 08:37:10 UTC+1, Tomas Lycken wrote:
>
> In my thesis, I'm working on a project that produces huge amounts of 
> output in text files - about 25-30GB spread across a million or more files 
> per simulation run. If I compress the files using e.g. `tar --xz --create 
> -f archive.tar.gz tracefiles/` I can reduce the size on disk by a factor 
> 5-6 or even more. I postprocess all this data in Julia, and reading the 
> data files seems to be a major bottleneck.
>
> Has any effort been made toward reading files in these formats in Julia? 
> I've seen [ZipFile](https://github.com/fhs/ZipFile.jl) for handling the 
> .zip format, but unfortunately `zip` isn't available on our cluster, while 
> `tar` is.
>
> If there hasn't been any work on this, I might take a stab at it sometime 
> - but first I must finish my thesis...
>
> // T
>

Reply via email to