I have run into a problem with sort not related to the collating sequence. :-) It turns out I found two things. This is the first part. Please fasten your seatbelt and hang on, I have a long message. Sorry about that but I had a lot to say. Please trim appropriately for any replies.
While sorting a moderately big file on HP-UX 10.20 of around 200MB, the GNU textutils-2.0.14 sort needed to create temporary files for later merging. The program died with a rather confusing error message to the user. Note that '{' follows 'z'. /tmp/sort{12345: No such file or directory In sort.c it uses create_temp_file() it sets up a temp_dir[] and then adds /sortXXXXXX to it in traditional fashion to yield /tmp/sortXXXXXX for mkstemp("/tmp/sortXXXXXX"). All fine. More or less. It seems that traditionally mkstemp() uses the process id and prepends a letter to yield something a temporary filename like /tmp/sorta12345. This is the algorithm on HP-UX. But having only one letter limits the number of temporary file names that this algorithm can generate quite severely. It is limited to 26 files! In fact the man page says the following which I have trimmed: NAME mktemp(), mkstemp() - make a unique file name ... Remarks: These functions are provided solely for backward compatibility and importability of applications, and are not recommended for new applications where portability is important. For portable applications, use tmpfile() instead (see tmpfile(3S)). ... RETURN VALUE mktemp() returns its argument except when it runs out of letters, in which case the result is a pointer to the empty string "". mkstemp() returns an open file descriptor upon successful completion, or -1 if no suitable file could be created. ... WARNINGS It is possible to run out of letters. ... STANDARDS CONFORMANCE mktemp(): SVID2, SVID3, XPG2 [Actually it does not return the empty string, it returns "sort{12345" which I will report as a bug. But at that point, who cares?] And checking the online docs at single_unix_specification_v2 I see: "It is possible to run out of letters. For portability with previous versions of this document, tmpfile() is preferred over this function." So it seems that regardless of the convenience a standard conforming implementation of mkstemp() can run out of letters! I can't complain that the library routine should be fixed since it legally isn't broken according to the standards. The standard does not say how many temporary files must be provided, just that a limit may exist. And if so then the code should operate within those limits. Although unknown. I assume that the mkstemp() author was working on an old Version 6 or Version 7 system which only allowed 20 files open as a maximum and so providing 26 was more than the total number the system allowed and gave headroom besides. I can't recall when systems went from 20 to 60 files maximum. Now it is either 1024 or indefinite on most modern systems. Too bad the standard did not require mkstemp() to provide at least that many files as well so that it could have kept up with the times. Of course nothing prevents implementations from increasing the number of files that mkstemp() can provide. This is what glibc does. TMP_MAX is 238328 there. So what about tmpfile()? It uses a modern algorithm. But that is hard to use in this case. Because sort needs to be able to merge sorted files as part of the normal operation of the program it appears inconvenient to redesign the program to use tmpfile(), which returns FILE*s to unlinked files instead. There really is no equivalently convenient functionality for this task. I would stick with a mkstemp() like function interface. Although a wrapper around the tmpnam() routine could do the job. It would look similar to what is in the fallback code. At some point in the future I might advocate switching to tmpnam() and will reserve that opinion here. But for now I just want to use the provided fallback code almost verbatim. There is a fallback module in textutils just or this purpose. If a mkstemp() had not been provided in the library at all then autoconf would have determined that and the fallback routine would have done a perfectly fine and portable job of this function. This implementation appears to be the same as in glibc. Therefore in my patched version of sort I have forced fallback to the textutils included lib/mkstemp.c and lib/tempname.c functions in order to correct this problem of running out of temporary files on hpux. This works fine. It seems possible to always use the fallback code. But that does not seem right on systems that provide an improved version in their libc. Therefore I suggest the following. Provide a configure runtime test to check if mkstemp() can provide a reasonable number of temporary files. If so then go ahead and use it. But if not then #undef HAVE_MKSTEMP in config.h instead of defining it as it does now. Here is one possible test for configure. #include <stdlib.h> #include <stdio.h> int main() { int i; char buf[64]; for (i = 0; i < 30; ++i) { strcpy(buf,"/tmp/acmkstempXXXXXX"); if (mkstemp(buf) < 0) exit(1); } return 0; } Note the number of files should be larger than 26. Should it test to (60 - 3)? Or to some other limit? 2*26+1? I would guess the portability problem trigger is 26 and you either have a 26 limited routine or you don't. So any number past that would suffice. But unlike Jim I am not a configure wizard and so I am not going to attempt to suggest a complete solution here. I will leave that as an exercise for the reader! But please do educate me on the proper way to do this. Among my problems that I don't know how to solve are, how do the temporary files created by this test get cleaned up by the test? And what happens if someone is cross-compiling? [In the case of cross compiling I would always fall back to the internal implementation because it should be good to go and the target machine can't be tested.] Are there other ways to solve this problem? Now for a problem I see in the implementation in tempname.c. There I find the following code: #include <stdio.h> #ifndef P_tmpdir # define P_tmpdir "/tmp" #endif #ifndef TMP_MAX # define TMP_MAX 238328 #endif The P_tmpdir is correct. The system should be allowed to define the default temp directory. But the conditional definition of TMP_MAX I believe is wrong since the included implementation is not limited to the system's value of TMP_MAX. If the system defines that to be small, say 26, then we are back to the same problem we were trying to solve before! In this case it doesn't. HP-UX defines it to be 17576 which is probably fine for most conceivable practical tasks. But any given system could define this to be arbitrarily small. Why artifically limit the fallback code to the system limit when the whole reason the fallback code is being used is that that system version is found lacking? The TMP_MAX in tempname.c should be unconditionally #undef'd and defined to be the value desired by the included implementation. The included fallback implementation is not limited to 238328 files and I don't know how that number was derived. I certainly would not want that many files in one directory. Please spare me that sight. But it is as good as any other number larger than a couple of thousand. The fallback implementation should not use the system value of TMP_MAX and should always defined a reasonable value itself. Thanks Bob _______________________________________________ Bug-textutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-textutils