Hello! On Sat, Feb 10, 2024 at 03:14:02PM +1000, David Connors wrote:
> Hi All, > > I have moved off IIS/WIndows onto nginx on ubuntu a while back. Since doing > so I receive 404s for files with international characters in their name. > I've added the charset utf-8 directive to the nginx config. Looking at the > request: > > https://www.davidconnors.com/wp-content/uploads/2022/08/Aliinale-Für-Alina.pdf > > Confirm that is exists on the file exist on the filesystem: > > -rwx------ 1 www-data www-data 10560787 Aug 21 2022 Aliinale-Für-Alina.pdf > > if I copy that from that name to a.pdf and request that it serves fine. > > Access log shows the character with the diacritic mark is escaped: > 172.68.210.38 - - [10/Feb/2024:05:11:27 +0000] "GET > /wp-content/uploads/2022/08/Aliinale-F%C3%BCr-Alina.pdf HTTP/1.1" 404 27524 > "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 > (KHTML, like Gecko) Version/17.2.1 Safari/605.1.15" > > What confirmation directive am I missing? File names on Unix systems are typically stored as bytes, and it is user's responsibility to interpret them according to a particular character set. As long as nginx returns 404, this suggests that you don't have a file with the name with C3 BC UTF-8 bytes in it: instead, there is something different. My best guess is that you are using Latin1 as a charset for your terminal, and there is an FC byte instead. To see what's there in fact, consider looking at the raw bytes in the file name with something like "ls | hd". Also, you can use nginx autoindex module - it will generate a page with properly escaped links, so it will be possible to access files regardless of the charset used in the file names. -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx mailing list nginx@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx