On Sun, 2008-12-07 at 15:39 +0000, Mick wrote:
> On Friday 05 December 2008, Albert Hopkins wrote:
> > On Thu, 2008-12-04 at 07:10 +0000, Mick wrote:
> > > Almost every time I split a large file >1G into say 200k chunks, then ftp
> > > it to a server and then:
> >
> > That's thousands of files! Have you gone mad?!
>
> Ha! small error in units . . . it is 200M (of course this is no disclaimer of
> me going/gone mad . . .) I think the server drops the connection above 230M
> file uploads or something like that, so I tried 200M files and it seems to
> work.
>
> > > cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
> > >
> > > if fails. Checking the split files in turn I often find 1 or two chunks
> > > that fail on their own md5 checks. Despite that the concatenated file
> > > often works (e.g. if it is a video file it'll play alright).
> >
> > Let me understand this. Are [1..7] the split files or the checksums of
> > the split files?
>
> They are the the split files which I concatenate into the complete file.
Well, unless you made another error in your OP, you are using md5sum
incorrectly. When you use "-c", md5sum expects a file that is a list of
files/checksums. For example
$ dd if=/dev/urandom of=bigfile bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s
$ md5sum bigfile > checksum # create checksum file
$ split -b1M bigfile
$ rm bigfile
$ cat xa* > bigfile
$ # This is correct
$ md5sum -c checksum
bigfile: OK
$ # This is wrong!
$ md5sum -c bigfile
md5sum: bigfile: no properly formatted MD5 checksum lines found
[SNIP!]
> > Maybe if you give the exact commands used I might understand this
> > better.
> >
> > I have a feeling that this is not the most efficient method of file
> > transfer.
>
> split --verbose -b 20000000 big_file
>
> tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa xab xac
> xad . . .
>
> The above would fail after xaa was uploaded and about 1/3 or less of xab.
> So,
> I split up the individual file upload:
>
> tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa ; sleep
> 1m ; tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xab ;
> sleep ... ; etc.
>
> Does this make sense?
Yes, but if you are truly using "-c" then it would make sense that you
could get a checksum error but the file be ok.
Here's how I would do it. I'm not saying you should do it this way.
I'd use rsync. Rsync does file xfer has checksumming built-in. You say
you split because you get disconnected, right? I'm not sure if rsync
handles re-connects, but you can write a loop so that if rsync fails you
continue where you left off:
status=30
until [ $status -eq 0 ] ;
do
rsync --append-verify big_file server_name:/htdocs/<directory_path>/
status=$?
done
No splitting/concatenating and no need to checksum.