On Tuesday 03 September 2013 23:17:09 Ángel González wrote: > On 03/09/13 11:16, Tim Ruehsen wrote: > > What should it say than ? > > My ideas are limited to something like > > "There was an unexpected signal SIGBUS. It may be a bug or a misuse of > > Wget or your hardware is broken. Please think about it.". > > > > This does not give more information than a "SIGBUS". > > Ideas welcome. > > Well, if it shall provide more information... > > Error reading links.html file. I was expecting it to have 23K, but it > now suddenly has > only 420 bytes. Seems that another program has changed it behind my > back. It is > unacceptable to perform my job under this conditions. > *wget exited*
Very well, if this would be possible. Right now I have no idea how to print something like the above. I made Tomas Hozza's test with valgrind and wget having debug info. I got 18x (out of 20x) SIGBUS, but on completely different places in the code. Within the misuse test situation, SIGBUS could occur at any place where memory access (read or write) allocated by wget_read_file(). Absolutely randomly / unpredictable if an outside process changes the file size and/or content at the same time. And SIGBUS could also occur out of any other reason (e.g. real bugs in Wget). As was already said, replacing mmap by read would not crash (wget_read_file() reads as many bytes as there are without prior checking the length of the file). But without additional logic, it might read random data (many processes writing into the file at the same time, not necessarily the same data). Wget would try to parse / change (-k) it, the result would be broken, but no error would be printed. So, replacing mmap is not a solution, but maybe a part of a solution. Now to the possible solutions that come into my mind: 1. While downloading / writing data, Wget could build a checksum of the file. It allows checking later when re-reading the file. In this case we could really tell the user: hey, someone trashed our file while we are working... To get this working, we must remove the mmap code. 2. Using tempfiles / tempdirs only and move them to the right place. That would bring in some kind of atomicity, though there are still conflicts to solve (e.g. a second Wget instance is faster - should we overwrite existing files / directories). 3. Keeping html/css files in memory after downloading. These are the ones we later re-read to parse them for links/URLs. Writing them to disk after parsing using a tempfile and a move/rename to have atomicity. 4. Using (advisory) file-locks just helps against other Wget instances (is that enough ?). And with -k you have to keep the descriptor open for each file until Wget is done with downloading everything. This is not practical, since there could be (10-, 100-)thousands of files to be downloaded. If someone likes to work on a patch, here is my opinion: I would implement 1. as the least complex to code (but it needs more CPU). Point 4 is would not work in any cases. Regards, Tim
