Executive summary: Looking to potentially split functionality/improvements in to other programs/names instead.
Is there already a program that offers fail-over writing or splitting a single input stream (a complex chain's output as an example) in to differently sized files? On Fri, Dec 4, 2009 at 5:42 AM, Eric Blake <[email protected]> wrote: > According to [email protected] on 12/4/2009 2:36 AM: >> I originally wrote this because I seemed to see a lack of any similar >> program on 3 different distributions I use. > > Thanks for the proposal. Unfortunately, while you did a great job of > describing that your code exists, you did a poor job of describing what > task you need accomplished, and why your code is the only thing that > appears to be able to achieve your end goal. > > Have you tried truncate(1), provided by default in coreutils 7.0 or newer? > There has been discussion on this list about making GNU truncate slightly > more powerful (as in copying fallocate(1) behavior of some other > implementations). But it seems like truncate would be the perfect program > to extend for dealing with sparse files, rather than writing a new one. > (Hmm, on looking at your source, your comment mentions that you used > truncate(1) to create the sparse files that the rest of your unit tests > then manipulate). Indeed, one of my unit tests uses truncate, it's good for building large-addressed files of small disk footprint, but not so useful for splitting sparse files apart. Truncate also fails the test of it's name making sense for it's job. I wasn't actually aware of all of copy's actions for the same reason; I expected a transcription program like DD to have a sparse conv (didn't see it, which is why I wrote mine). Copy I expect to -copy- any existing holes but not to have an option to scan for sparse unit sections; though it's nice to know it exists and I can see uses for it when working with batches of VMs. Knowing that cp has a sparse option there are only two remaining unique features my program offers: 1) The -t truncate trailing pad, to shorten the files by disused sections (though there are better in-place approaches which could be implemented within truncate). 2) If DD had a sparse conversion flag it could be fully used to replace the other functionality (section-splitting) within a simple shell script. 3) The existing split, which I did find while searching for duplicate names, could be improved to implement multiple sizes. However I'm not really sure anyone needs that support who wouldn't already prefer using the DD method above. The main idea behind splitting across multiple devices of various size is device-to-device backup (especially for lightly compressed input streams) split across multiple targets. Though there is one advantage to my version; it can easily operate on a single input stream and multiple output devices. Maybe that functionality should be renamed to another program instead, especially if fail-over writing were added. I've not yet needed that, but I can image cases where that would be desirable. > and we would also need > copyright assignment to FSF. Also, your example does not follow the same > coding conventions as the rest of coreutils, and needs a NEWS blurb and > info documentation. If you really want it in coreutils, I'd rewrite it as > a patch on top of coreutils.git rather than an independent repository. > Personally, I didn't even spend much time reviewing it for technical > merit, because it was such a different style and I don't want to be the > one rewriting it to fit coreutils' style. In other words, if you want a > more thorough review, then making your contribution fit the existing mold > will get you a lot further in having people willing to read it. Yeah, at this point I can see how it might be better to package the functionality in a slightly different way; I wasn't originally looking to submit it to coreutils but realized that it would be nice for others to have easy access to some of the features. If I proceed with some future version of this project I now know where the code will end up and can check on the desired coding style beforehand. > If nothing else, your calling convention looks different than cp. For > example, where 'cp A B C' copies A and B into directory C, it appears that > your proposed 'sparse A B C' copies A into sparse files B and C. I find > that confusing; it may be better to model your command line syntax after > cp, since cp already handles sparse files. Actually the main use, as noted in the manpage is to operate on another, non-sparse aware, program's output. An additional flag (-p) must be passed for it to treat the first 'file' as an input source instead. Though all of my unit tests operate on existing files so that's not quite clear. I guess that is an extra test I should add if I make further changes.
