Wrapping libxz would actually be quite useful, though (libbz2 as well).

You might look at GZip.jl or Libz.jl for inspiration--it's quite possible
that the library is similarly set up. Clang.jl also might be useful to
produce an initial wrapper.

Cheers, Kevin

On Tuesday, June 10, 2014, Simon Byrne <[email protected]> wrote:

> If you were happy to just compress/uncompress each text file individually
> (i.e. not using tar) you could use Gzip.jl:
> https://github.com/kmsquire/GZip.jl
>
> libtar provides a C interface for some tar functionality, however its
> worth keeping in mind that tar files are not indexed, and so have very poor
> random access performance (extracting an individual file requires
> sequentially searching through the whole tarball). If you want to use tar,
> you might be better off just untar-ing the whole lot before running your
> julia job, then tar-ing it all back up again after, in which case you might
> as well just use the shell commands.
>
>
> On Tuesday, 10 June 2014 08:37:10 UTC+1, Tomas Lycken wrote:
>>
>> In my thesis, I'm working on a project that produces huge amounts of
>> output in text files - about 25-30GB spread across a million or more files
>> per simulation run. If I compress the files using e.g. `tar --xz --create
>> -f archive.tar.gz tracefiles/` I can reduce the size on disk by a factor
>> 5-6 or even more. I postprocess all this data in Julia, and reading the
>> data files seems to be a major bottleneck.
>>
>> Has any effort been made toward reading files in these formats in Julia?
>> I've seen [ZipFile](https://github.com/fhs/ZipFile.jl) for handling the
>> .zip format, but unfortunately `zip` isn't available on our cluster, while
>> `tar` is.
>>
>> If there hasn't been any work on this, I might take a stab at it sometime
>> - but first I must finish my thesis...
>>
>> // T
>>
>

Reply via email to