Re: Feature Request: include sparse - Produce one or more sparse files from an input stream. - for repacking files and streams

Michael Evans Fri, 04 Dec 2009 18:17:09 -0800

Executive summary:

Looking to potentially split functionality/improvements in to other
programs/names instead.

Is there already a program that offers fail-over writing or splitting
a single input stream (a complex chain's output as an example) in to
differently sized files?

On Fri, Dec 4, 2009 at 5:42 AM, Eric Blake <[email protected]> wrote:
> According to [email protected] on 12/4/2009 2:36 AM:
>> I originally wrote this because I seemed to see a lack of any similar
>> program on 3 different distributions I use.
>
> Thanks for the proposal.  Unfortunately, while you did a great job of
> describing that your code exists, you did a poor job of describing what
> task you need accomplished, and why your code is the only thing that
> appears to be able to achieve your end goal.
>
> Have you tried truncate(1), provided by default in coreutils 7.0 or newer?
>  There has been discussion on this list about making GNU truncate slightly
> more powerful (as in copying fallocate(1) behavior of some other
> implementations).  But it seems like truncate would be the perfect program
> to extend for dealing with sparse files, rather than writing a new one.
> (Hmm, on looking at your source, your comment mentions that you used
> truncate(1) to create the sparse files that the rest of your unit tests
> then manipulate).

Indeed, one of my unit tests uses truncate, it's good for building
large-addressed files of small disk footprint, but not so useful for
splitting sparse files apart.  Truncate also fails the test of it's
name making sense for it's job.

I wasn't actually aware of all of copy's actions for the same reason;
I expected a transcription program like DD to have a sparse conv
(didn't see it, which is why I wrote mine).  Copy I expect to -copy-
any existing holes but not to have an option to scan for sparse unit
sections; though it's nice to know it exists and I can see uses for it
when working with batches of VMs.

Knowing that cp has a sparse option there are only two remaining
unique features my program offers:

1) The -t truncate trailing pad, to shorten the files by disused
sections (though there are better in-place approaches which could be
implemented within truncate).
2) If DD had a sparse conversion flag it could be fully used to
replace the other functionality (section-splitting) within a simple
shell script.
3) The existing split, which I did find while searching for duplicate
names, could be improved to implement multiple sizes.  However I'm not
really sure anyone needs that support who wouldn't already prefer
using the DD method above.  The main idea behind splitting across
multiple devices of various size is device-to-device backup
(especially for lightly compressed input streams) split across
multiple targets.

Though there is one advantage to my version; it can easily operate on
a single input stream and multiple output devices.  Maybe that
functionality should be renamed to another program instead, especially
if fail-over writing were added.  I've not yet needed that, but I can
image cases where that would be desirable.

> and we would also need
> copyright assignment to FSF.  Also, your example does not follow the same
> coding conventions as the rest of coreutils, and needs a NEWS blurb and
> info documentation.  If you really want it in coreutils, I'd rewrite it as
> a patch on top of coreutils.git rather than an independent repository.
> Personally, I didn't even spend much time reviewing it for technical
> merit, because it was such a different style and I don't want to be the
> one rewriting it to fit coreutils' style.  In other words, if you want a
> more thorough review, then making your contribution fit the existing mold
> will get you a lot further in having people willing to read it.

Yeah, at this point I can see how it might be better to package the
functionality in a slightly different way; I wasn't originally looking
to submit it to coreutils but realized that it would be nice for
others to have easy access to some of the features.  If I proceed with
some future version of this project I now know where the code will end
up and can check on the desired coding style beforehand.

> If nothing else, your calling convention looks different than cp.  For
> example, where 'cp A B C' copies A and B into directory C, it appears that
> your proposed 'sparse A B C' copies A into sparse files B and C.  I find
> that confusing; it may be better to model your command line syntax after
> cp, since cp already handles sparse files.

Actually the main use, as noted in the manpage is to operate on
another, non-sparse aware, program's output.  An additional flag (-p)
must be passed for it to treat the first 'file' as an input source
instead.  Though all of my unit tests operate on existing files so
that's not quite clear.  I guess that is an extra test I should add if
I make further changes.

Re: Feature Request: include sparse - Produce one or more sparse files from an input stream. - for repacking files and streams

Reply via email to