There are a few good reasons for using very large reads/write when possible
(much larger than 1-4K). First, modern disks are very good at dishing out
large contiguous data streams -- and if you are luck enough to have all
of the data of the file allocated contiguously you'll save time.
Secondly, linux is very good at cacheing disk space in memory, so particularly if the
file you are acessing has been accessed in the past, you will find that
large reads are better. This is because for each read you need to take
trap to the kernel - the overhead is large in proportion to the time it takes
to copy 1K of data from kernel buffers to your program's buffer.
In general, if you know you are planning to read a whole lot of data, just create a
big buffer and read it 100K at a time.
If you really want to go all-out, try reading up on mmap. What this does is mapa
portion of a file into your programs memory space, and it is much more
efficient because data doesn't actually need to be copied from kernel space to user
space.
Have fun,
Chris
> Occasionally, I have need to read the entire contents of a file (which
> can be of any length) just so I can write it out again. For example, I
> may want to create a copy of a file, or simply read a file and dump its
> contents to stdout. The following program, for example simply reads a
> file called tester.txt and prints it to stdout:
>
> #include <stdio.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> #define BUFSIZE 1024
>
> void print_file(const char *pathname) {
> int fd;
> char buf[BUFSIZE];
> int bytes_read;
>
> fd = open(pathname,O_RDONLY);
> if(fd == -1) {
> puts("error opening");
> return;
> }
>
> do {
> bytes_read = read(fd,buf,BUFSIZE);
> if(bytes_read != -1)
> if(write(1,buf,bytes_read) != bytes_read) {
> puts("error writing.");
> return;
> }
> } while(bytes_read > 0);
>
> if(close(fd) == -1)
> puts("error closing");
>
> }
>
> void main() {
> print_file("tester.txt");
> }
>
> The technique is to read and write the data in BUFSIZE byte chunks.
>
> First, is this a reasonable approach?
>
> Second, is there an exceptionally good number to use for BUFSIZE, for a
> particular installation of linux? I've noticed that the struct_stat
> structure,which is used by the stat() system call has a member called
> st_blksize. For files on my system, st_blksize is 4096 = = 4K. Is this
> the 'ideal' size of BUFSIZE (because maybe the OS reads data in those
> sized chunks.)?
>
> Is st_blksize similar in concept to the "cluster size" on MS-DOS disks?
> Where does the 'inode' size fit into the picture? When I prepped my
> hard disk during the Slackware install, I set the inode size to 1K.
>
> Anyone have any experience in this area?
>
>
> Thanks,
> Steve Narmontas
>