----- Original Message ----- > Very well, if this would be possible. Right now I have no idea how to print > something like the above. I made Tomas Hozza's test with valgrind and wget > having debug info. I got 18x (out of 20x) SIGBUS, but on completely different > places in the code. Within the misuse test situation, SIGBUS could occur at > any place where memory access (read or write) allocated by wget_read_file(). > Absolutely randomly / unpredictable if an outside process changes the file > size and/or content at the same time. > > And SIGBUS could also occur out of any other reason (e.g. real bugs in Wget). > > As was already said, replacing mmap by read would not crash (wget_read_file() > reads as many bytes as there are without prior checking the length of the > file). But without additional logic, it might read random data (many > processes > writing into the file at the same time, not necessarily the same data). Wget > would try to parse / change (-k) it, the result would be broken, but no error > would be printed. So, replacing mmap is not a solution, but maybe a part of a > solution. > > Now to the possible solutions that come into my mind: > 1. While downloading / writing data, Wget could build a checksum of the file. > It allows checking later when re-reading the file. In this case we could > really tell the user: hey, someone trashed our file while we are working... > To get this working, we must remove the mmap code. > > 2. Using tempfiles / tempdirs only and move them to the right place. That > would bring in some kind of atomicity, though there are still conflicts to > solve (e.g. a second Wget instance is faster - should we overwrite existing > files / directories). > > 3. Keeping html/css files in memory after downloading. These are the ones we > later re-read to parse them for links/URLs. Writing them to disk after > parsing > using a tempfile and a move/rename to have atomicity. > > 4. Using (advisory) file-locks just helps against other Wget instances (is > that enough ?). And with -k you have to keep the descriptor open for each > file > until Wget is done with downloading everything. This is not practical, since > there could be (10-, 100-)thousands of files to be downloaded. > > If someone likes to work on a patch, here is my opinion: I would implement 1. > as the least complex to code (but it needs more CPU). Point 4 is would not > work in any cases. > > Regards, Tim
Thanks for the brainstorming. The solution #1 seems as most reasonable to me. I was thinking about 2. and 4., but there are possible issues that you've already mentioned. I had a look at the source, but unfortunately the changes to create and verify the checksum of downloaded files is not trivial. Regards, Tomas
