Wrapping libxz would actually be quite useful, though (libbz2 as well). You might look at GZip.jl or Libz.jl for inspiration--it's quite possible that the library is similarly set up. Clang.jl also might be useful to produce an initial wrapper.
Cheers, Kevin On Tuesday, June 10, 2014, Simon Byrne <[email protected]> wrote: > If you were happy to just compress/uncompress each text file individually > (i.e. not using tar) you could use Gzip.jl: > https://github.com/kmsquire/GZip.jl > > libtar provides a C interface for some tar functionality, however its > worth keeping in mind that tar files are not indexed, and so have very poor > random access performance (extracting an individual file requires > sequentially searching through the whole tarball). If you want to use tar, > you might be better off just untar-ing the whole lot before running your > julia job, then tar-ing it all back up again after, in which case you might > as well just use the shell commands. > > > On Tuesday, 10 June 2014 08:37:10 UTC+1, Tomas Lycken wrote: >> >> In my thesis, I'm working on a project that produces huge amounts of >> output in text files - about 25-30GB spread across a million or more files >> per simulation run. If I compress the files using e.g. `tar --xz --create >> -f archive.tar.gz tracefiles/` I can reduce the size on disk by a factor >> 5-6 or even more. I postprocess all this data in Julia, and reading the >> data files seems to be a major bottleneck. >> >> Has any effort been made toward reading files in these formats in Julia? >> I've seen [ZipFile](https://github.com/fhs/ZipFile.jl) for handling the >> .zip format, but unfortunately `zip` isn't available on our cluster, while >> `tar` is. >> >> If there hasn't been any work on this, I might take a stab at it sometime >> - but first I must finish my thesis... >> >> // T >> >
