Functioning as designed ... (Disclaimer: I am not an expert user of this program, but I have some experience that may help you:)
I guess you are Windows users. Unlike Unix and Linux systems, in Windows the last part of a file name (anything following the last ("rightmost") period is considered the file extension and can be used to determine what application would open the file by default (e.g a .html file would be opened by a browser, a .doc (or nowadays a .docx file) would be given to a word processor, such as Microsoft Office's word.exe (the .exe indicating that this file contains executable code)etc.) That is the one aspect of what is going on here - you downloaded something that was a .html file, but you didn't give it a name. Somewhere in teh documentation it will tell you that (and presumably why) it will give such a file a default file name of "index" followed by the file extension. The other aspect is what will happen if you download a file to a location where a file of the same name and extension is already present. There are a few options, between which you can choose using parameters on the command line - and these options make good sense in certain circumstances and none at all in certain other circumstances. (I'll let you dig through the documentation of wget, since that is an important part of testing (evaluating) the program as part of your project ;-) The most obvious choices you may want to try out are the following (and they apply regardless of whether you are downloading a file named index.html or an image file named JamesBond007.jpg - I'll go with index.html for an example): First option: Your existing file index.html is now outdated and the new version - with the same file name - will overwrite it. (hint: in the language of the documentation, it will "clobber" the file.) Second option: Your existing file should not be overwritten ("clobbered"), so even though your new file was meant to have the same name, it will be called index.html.1 or index.html.2 or - eventually index.html.4711 and so on. This may not be pretty, but it is effective. Windows users typically would expect to see a different syntax (but wget is not just for Windows) - index (1).html, index (2).html, ..., index (4711).html might look more acceptable to you ... Third option: When downloading files across a notoriously unreliable line the process may be interrupted by line failure before the file is complete. Wget gives you the option then to continue downloading by adding the additional data from retrying the download to the end of the existing file - in my life that has been the option I used most, especially since Murphy's Law stipulates that the worse your line, the bigger your files. Obviously, wget can't make the decision for you, which of these options you need in any given situation. And it is pretty much impossible to fix the results after the fact if you chose the wrong one. What you can do, though, is rename all the .1, .2, .3, etc. files to something more sensible. And when you plan to download complete web sites or similar groups of files, wget offers you ways to drop them with sensible names (most likely taken from your source) into a suitable directory structure (e.g. to duplicate the source structure.) Study the documentation that came with your downlaoded copy of wget (or find it elsewhere on the web) and play with the program a bit more. Do come back here for more advice if/when needed. And I'll let the experts answer when their input is needed ;-) Good luck, Gerd ----- Original Message ----- From: "Joel F Leppänen" <joel.f.leppa...@student.lut.fi> To: "bug-wget" <bug-wget@gnu.org> Sent: Sunday, October 15, 2023 4:44:33 PM Subject: Problematic default file naming system (BUG?) Hi all, We’re testing wget version 1.24.4 for a school project. When downloading an .html file, if you don’t name it and download additional .html files, also unnamed, it saves the second and the following files after that in formats that don’t exist. The first one is saved as ”index.html” and the second one as ”index.html.1”, the third one as ”index.html.2” and so forth. The files can of course be changed back to .html-formats afterwards, but I feel like this is a bug that affects user experience negatively (or it’s intended, but I can’t figure out why that would be). Regards, Joel Leppänen and Werneri Punavaara LUT University