Re: How to copy n bytes from stdin to stdout?

Steve Litt Thu, 28 Jun 2018 13:11:06 -0700

On Wed, 27 Jun 2018 10:06:40 -0700
xi <x...@nuxi.ca> wrote:

> > On Jun 25, 2018, at 16:19, Tomasz Rola <rto...@ceti.pl> wrote:
> > 
> > On Sun, Jun 24, 2018 at 10:53:37PM -0400, Steve Litt wrote:  
> >> On Thu, 21 Jun 2018 00:56:04 +0200
> >> Tomasz Rola <rto...@ceti.pl> wrote:
> >>   
> > [...]  
> >>> Craps. I have consulted OpenBSD's manpage for dd and there is no
> >>> mention of iflag. So this will not work on OpenBSD. I will have to
> >>> rethink this, sorry.
> >>>   
> >> 
> >> Untested...
> >> 
> >> int main(int argc, char* argv[]){
> >>  long l = atod(argv[1]);
> >>  while(l--){
> >>    if (c = getc(STDIN) != EOF)
> >>        putc(c, STDOUT);
> >>    else
> >>        break;
> >>  }
> >> return 0;
> >> }
> >> 
> >> I haven't tested it so it might not be exactly right, and of course
> >> error handling would need to be added, but you know what I mean.
> >> IIRC getc() and putc() are very well buffered so it will be fast.
> >> In my youth I wrote similar functions using low level read() and
> >> write() and doing my own buffering, and those things were *really*
> >> fast, but I think that's overkill in this century.
> >> 
> >> As far as finding command line tools that do it, if that's becoming
> >> hard to do, why not just write a 10 line program?  
> > 
> > Actually, I have written few such programs to satiate my own
> > curiosity
> > - I was dragged away from computer and in the meantime, others
> > joined thread and even wrote nice buffered version of solution in
> > C. I pitted this solution against my programs (in C, with
> > fgetc/fputc and Common Lisp, with read-sequence/write-sequence) and
> > head-c.c was many times faster (about hundred or more times) than
> > my programs.
> > 
> > I am not sure if there is performance difference between fgetc/fputc
> > and getc/putc. Man says getc are macros around fgetc. Might be worth
> > checking, but I guess no difference.
> > 
> > My curiosity also "wanted" to know how much of performance hit was
> > to be expected when writing best to my knowledge optimised Common
> > Lisp vs simplistic C - they were similar in performance, with CL
> > compiled by SBCL and few times slower, and head-c.c had beaten them
> > both by many lengths. I am a bit surprised that in CL, performance
> > was about the same, whether reading one byte or many at once.
> > Perhaps I will find a way to speed it up some more.
> > 
> > As of finding command line tools, I had working script in about an
> > hour (and buggy one in few minutes). Buggy, because "dd | dd" is bad
> > idea, and after finding better options for using dd in my script -
> > which worked, but under Linux - I had also found out they would not
> > work in OpenBSD.
> > 
> > So, I consider it a worthy lesson for myself. Next time, I might
> > just fire up Emacs and write a script in CL (mostly, because this
> > is what is comfy for me nowadays, and I will not object against
> > having compiled script for free). Or something similar, or maybe
> > even do it in C, why not.
> > 
> > BTW, the version of nread.sh (improved options) was on par with
> > head-c.c, so writing a script with right things inside is very good
> > choice, too. If the script actually works :-) .
> > 
> > While the speed is not big problem for input of about 1 megabyte, it
> > becomes a problem when gigabytes are copied.
> > 
> > -- 
> > Regards,
> > Tomasz Rola
> > 
> > --
> > ** A C programmer asked whether computer had Buddha's nature.
> > ** ** As the answer, master did "rm -rif" on the programmer's
> > home    ** ** directory. And then the C programmer became
> > enlightened...      **
> > **
> > ** ** Tomasz Rola
> > mailto:tomasz_r...@bigfoot.com             ** 
> 
> If you want to do this in C, you can also simply take advantage of
> the fact that read(2) takes a number of bytes as argument and stops
> reading at EOF:
> 
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> 
> int
> main(int argc, char *argv[])
> {
>       if (argc < 2)
>               return 1;
> 
>       size_t n;
>       char *argend = argv[1] + strlen(argv[1]);
>       if (!(n = strtoull(argv[1], &argend, 10)))
>               return 1;
> 
>       char *buf = malloc(n);
>       size_t nr = read(0, buf, n);
>       write(1, buf, nr);
>       free(buf);
> 
>       return 0;
> }
> 
> This is probably as fast as it gets for a task this simple, and it
> copies as many bytes as malloc is willing to allocate.


This is how I did it, both in C and in Turbo Pascal 3.0, back in the
1980's before getc() and putc() were well buffered, except I had the
read and write in a loop. If the buffer is a multiple of sector length,
it's remarkably fast. My findings were that a 64KB buffer was lightning
fast. In Turbo Pascal, a 512KB buffer was about 10% faster.

In this and successive threads people were talking about speed and
visual progress indicator. It's trivial to add nr to the total after
each write, and indicate with:

printf("%ld bytes written.\r", nr);

If that slows too much, do it on 1 out of every 100 writes.

By the mid 1990's, I noticed that getc() and putc() had improved to the
point where in most cases handling my own buffers was overkill. By the
way, all of this was in the DOS OS. In Linux and BSD I think I've
always used getc() and putc().

SteveT

Steve Litt 
June 2018 featured book: Twenty Eight Tales of Troubleshooting
http://www.troubleshooters.com/28

Re: How to copy n bytes from stdin to stdout?

Reply via email to