Klein, Roger wrote: > I am using cut in an awkward situation: I got huge files that for any > reason show larger file sizes than they actually have.
Those files are probably sparse files. Sparse files can be created by using lseek(2) to seek to a different part of the file and writing data. The result is a file with data at different locations but with a gap between them. When this is stored the filesystem can take advantage of this by storing the gap in the file so that the file consumes fewer disk blocks than if the gap were written too. For example you can use dd to create a sparse file: dd bs=1 seek=1G if=/dev/null of=big That will have an apparent size of 1G but will actually consume almost no actual disk space. > 'du' reports the correct sizes b.t.w.: > # du -k boot_image.clone2fs > 56740 boot_image.clone2fs 'du' reports the disk usage of the file. This value may be smaller than the size of the file. Try using the --apparent-size option. du -k --apparent-size boot_image.clone2fs > Now I found a hint on the Web > (http://www.programmersheaven.com/mb/linux/187697/245244/re-how-to-change-filesize-in-linux/?S=B20000) > for how the change the incorrect filesize by using cut to take over > only a given amount of bytes into a new file: cut -b 1-500 oldFile > newFile Of course that will read every byte and write every byte and the result will no longer be sparse, assuming that the input file was sparse. I don't think truncating the file is really what you want to be doing here. If you really want to flatten the file then simply copying it would seem to be better. cp --sparse=never file1 file2 or cat file1 > file2 > I never tried it on short files, but when I use this on the above file I > get a very different result than expected: > # cut -b 1-58101760 boot_image.clone2fs > boot_image.clone2fs_correct Won't you need those bytes at the end of the file that you are removing? I wouldn't expect this to be good. The ending part of the file will be removed! I expect that you will be needing those bytes at some point. > # stat boot_image.clone2fs_correct > File: `boot_image.clone2fs_correct' > Size: 309987280 Blocks: 606048 IO Block: 4096 regular file For what it is worth those numbers don't seem to be right to me either. If the original stat shows 1077411840 bytes then that is the correct size that I would hope to see in any copy. > The number of blocks and the apparent size is all but correct now. Try comparing the two files. cmp boot_image.clone2fs boot_image.clone2fs_correct If they don't compare then I believe that you have corrupted the file. > To me this looks like a typical overflow problem. Could you please > investigate this? I think your problem is understanding the difference between the file size and the disk space consumed to hold it. du --apparent-size ls -l stat size wc -c ...most normal commands... Versus: du stat blocks Try this experiment: rm -f big big2 dd bs=1 seek=1M if=/dev/null of=big cat big > big2 wc -c big big2 cmp big big2 ls -log big big2 du big big2 du --apparent-size big big2 Hope this helps, Bob _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
