I see that you have subscribed now. Awesome! If you and others would be so kind as to list-reply instead of CC'ing me directly that would be great. I read the replies on the mailing list.
dirk+b...@testssl.sh wrote: > Bob Proulx wrote: > > You are doing something that is quite unusual. You are using a shell > > script direction on a TCP socket. That isn't very common. > > Do you think there should be a paragraph NOT COMMON where bash sockets > should rather belong to? You actually had not included enough background information to know if you were using the bash built in network implementation or not. You only showed that you had set up fd 5 connected to a network socket. That can happen because, for example, a script was used to service an inetd configuration or similar. It doesn't actually need to be the built in network protocol at all. But now that you have said the above I guess I can assume that you are using the built in implementation. As to whether the documentation should say this or not that is not really practical. There are a godzillian different things that are not typically addressed by writing a shell script. As a practical matter it is impossible to list everything out explicitly. And if one tries then the complaint is that the documentation is so long and detailed that is is unusable due to it. Primarily a shell script is a command and control program. It is very good for that purpose. It is typically used for that purpose. That is the mainstream use and it is very unlikely one will run into unusual situations there. But programming tasks that are much different from command and control tasks, such as your program interacting by TCP with other devices on the network, are not as common. I don't have facts to back that up but I do believe that to be true based upon the way I have seen shell scripts being programmed and used over a long period of time. Of course if you have spent the last 20 years programming network shell scripts then your observations will bias you the other way. :-) > > More > > typically one would use a C program instead. So it isn't surprising > > that you are finding interactions that are not well known. > > Bob, my intention was not to discuss program languages and what is typical > with you or anybody else here. Hmm... Put yourself in our shoes. You stood up on the podium that is this public mailing list and spoke into the megaphone addressing all of us complaining that bash's printf was buggy. But from my perspective printf is behaving as expected. It is designed to deal with line oriented data. It will also deal with binary data if one is careful. But it appears that your application wasn't careful enough and had tripped over some problems. Should we (me!) keep silent about those very obvious problems? It feels obvious to me but apparently not to the author of the above. As has often been said many eyes make all bugs apparent. I was pointing this out to you as a public service. But in response you seem hostile by the language above and below. That isn't encouraging any help. :-( > >> printf -- "$data" >&5 2>/dev/null > > > > Why is stderr discarded? That is almost always bad because it > > discards any errors that might occur. You probably shouldn't do this.> > > What happens if $data contains % format strings? What happens if the > > format contains a sequence such as \c? This looks problematic. This > > is not a safe programming proctice. > > I doubt you can judge on this by just looking at a single line > of code -- the project has > 18k LoC in bash. That single line of code was problematic just by itself standing alone without the rest of the program around it. That is independent of anything the rest of the program might contain. However if you would like to pass sections of the rest of the program through the help-bash mailing list then I am sure the group there would help improve the quality of it. > Github is the place to discuss and do PRs for our project. No. Sorry. You came here to this mailing list. Therefore this is the place to discuss it. Please put yourself in my shoes. If the case were reversed and I came over to Github and then stated that Github was not the place for the discussion but that you needed to set up email and come over to my mailing list and discuss it there instead. How would you feel? I had come into your house, asked you for help, then wanted you to go elsewhere? How would you feel? I can tell you that I do not feel very welcome by it. Also remember that Github is a non-free service. That is free as in freedom, not free as in beer. The free in Free Software. Or in this case the opposite of it being non-free. We try not to use software that does not respect our freedoms nor ask others to do so either. It's a philosophy of life thing. I hope you will understand. > >> If there's a workaround, please let me know. (tried to add "%b" with no > >> effect). Otherwise I believe it's a bug. Note that I *did* provide you with a way to do what you wanted to do. :-) It was also noted in another message that the external standalone printf command line utility did buffer as you desired. That seems another very good solution too. Simply use "command printf ..." to force using the external version. Anyway... Since printf is a text oriented utility it makes sense to me that I would operate in line buffered output mode. Let's look at the bash documentation for 'help printf': printf: printf [-v var] format [arguments] Formats and prints ARGUMENTS under control of the FORMAT. ... FORMAT is a character string which contains three types of objects: plain characters, which are simply copied to standard output; character escape sequences, which are converted and copied to the standard output; and format specifications, each of which causes printing of the next successive argument. The format provided in your example in $data is interpreted as a "character string". Apparently newlines (\n a.k.a. 0x0a characters) are used in the binary data in your implementation! However as a newline character it is causing line buffered output to be flushed resulting in line oriented write(2) calls. If you are trying to print raw binary data then I don't think you should be using 'printf' to do it. It just feels like the wrong utility to be used to me. Also there was the problematic use of it in the format string. Instead I would use utilities designed to work with binary data. Such as 'cat'. I personally might prepare a temporary file containing exactly the raw data that is needed to be transmitted and then use "cat $tmpfile >&5" to transmit it. Or if I wanted strict control of the block size making cat less appropriate then I would use "dd if=$tmpfile status=none bs=1M >&5" or some such where no interpretation of the data is done. However there may be a bug in the way bash opens that fd number 5 and sets up buffering. If it were me then I would look closely there. It is possible however that file descriptor was being opened that it should be using block buffering instead of line buffering. Since the network socket is not a tty I would suspect that it should be using block buffering. That is what I would expect. Therefore that is where I would look for a bug. Obviously I can be wrong though too. One should double check that fd 5 is not a tty. if [ -t 5 ]; then If it is a tty when I expect line buffering. If it is not then I would expect block buffering. Just as a general statement about programs using libc's stdio to write to it. > > You can re-block the output stream using other tools such as 'cat' or > > 'dd'. Since you are concerned about block size then perhaps dd is the > > better of the two. > > > > | cat > > cat has a problem with binary chars, right? And: see below. No. It does not. The 'cat' utility concatenates files. From the cat documentation: ‘cat’ copies each FILE (‘-’ means standard input), or standard input if none are given, to standard output. Synopsis: ... On systems like MS-DOS that distinguish between text and binary files, ‘cat’ normally reads and writes in binary mode. However, ‘cat’ reads in text mode if one of the options ‘-bensAE’ is used or if ‘cat’ is reading from standard input and standard input is a terminal. Similarly, ‘cat’ writes in text mode if one of the options ‘-bensAE’ is used or if standard output is a terminal. > > Or probably better: > > > > | dd status=none bs=1M > > > > Or use whatever block size you wish. The 'dd' program will read the > > input into its buffer and then output that block of data all in one > > write(2). That seems to be what you are wanting. > > We actually use dd to read from the socket. Of course we could use > writing to it as well -- at a certain point of time. Great! Problem solved then. :-) I didn't say it before but since this is such a long email making it a little longer won't hurt more. The status=none dd option is a GNU extension. It is useful in this context. But it is not a portable dd option. Other platforms may or may not implement it. *BSD implements it now but some of my beloved legacy Unix platforms do not. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html > Still, a prerequisite would be that printf is the culprit and not > how bash + libs do sockets. The repeated mention of sockets nudges me to point out that sockets are just files. There is nothing special about them as such. Trying to find fault there is just a false path to follow. Programs writing to a file descripted connected to a network socket don't "know" anything about the network. It is the network layer that is taking each write(2) and sending out packets. What is special is whether the device is a tty or not. If it is a tty then libc's standard I/O buffering does one thing. If it is not a tty then libc's standard I/O buffering does a different thing. Let's look at the documentation. For me when I want to look up documentation matching my system I use the locally installed info pages. But for the purposes of showing where this documentation exists I will point to the top of tree version online. However note that it may be newer than what you have installed locally. https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html#Stream-Buffering https://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html#Buffering-Concepts Newly opened streams are normally fully buffered, with one exception: a stream connected to an interactive device such as a terminal is initially line buffered. ... The use of line buffering for interactive devices implies that output messages ending in a newline will appear immediately-which is usually what you want. Additionally the stdio man page says: man stdio Output streams that refer to terminal devices are always line buffered by default; pending output to such streams is written automatically whenever an input stream that refers to a terminal device is read. In cases where a large amount of computation is done after printing part of a line on an output terminal, it is necessary to fflush(3) the standard output before going off and computing so that the output will appear. However I did not look at how bash's implementation of printf was coded. The above is just general information that generally applies to all utilities. > > P.S. You can possibly use the 'stdbuf' command to control the output > > buffering depending upon the program. > > > > info stdbuf > > That could be an option, thanks. Need to check though whether > > a) it doesn't fragment then -- not sure while reading it I feel compelled to say that the network stack is going to transmit a packet every time write(2) is called. Programs doing the writing don't know that they are writing to a network stream. They are just writing data using write(2). If it is a fully network aware program then of course it may be using sendto(2) or other network specific call. But general filter utilities are not going to be using those calls and are just going to read(2) and write(2) and not have any specific network coding. That's part of the beauty of the Unix Philosophy. Everything is a file. In your case though you are trying to pump around binary data and are using line oriented text utilities that are using line buffering and that is where problems are being tripped over. You are thinking of this as fragmentation. Because in your application it appears to you in your context as fragmentation. But as a general statement it isn't fragmentation. It is just a data stream being written every time it is being written. Certainly any text program writing lines out isn't going to be coded in any way that knows about TCP data blocks. For any program in the middle it is just lines of text in and lines of text out. Or in the case of other programs that deal with binary data such as 'cat' it is just bytes in and bytes out. The concept of fragmentation belongs to a different layer of the software block diagram. [[ There is an old joke related to this too. "The Unix way -- everything is a file. The Linux way -- everything is a filesystem." Haha! And also a quote, "I think the major good idea in Unix was its clean and simple interface: open, close, read, and write." --Ken Thompson ]] > b) it's per default available on every platform supported by testssl.sh. The 'stdbuf' utility is included in GNU coreutils starting with version 7.5 onward. It may not be available on other platforms. It didn't feel like the right solution to me. But I mentioned it in passing in the P.S. because it is related. Perhaps it will be useful to you. Hope this helps! :-) Bob