Hi Rudy, IMHO UTF-8 encoding only makes sense in the context of plain text files (character based files like txt, csv, tsv, xml, json, html,...). But it has no meaning for binary files (PDF, pictures). xlsx and docx files are essentially zip archives containing a bunch of xml files. For xml files UTF-8 is the default encoding. But you (or you customer) should not worry about those.
The real problem arises when trying to import and process plain text files. Especially for the high character codes. MacRoman encoding of some characters is different than eg Latin-1 and those encodings have a limited range. Unfortunately, as Lutz also mentions, it is not possible to determine in all cases the used character encoding when receiving a text file. The BOM is an indication for UTF files, but in my experience rarely used. BOM is not required. FWIW, BBEdit also guesses what the text file encoding could be. It does a good job, but it can be fooled. Eg. create a new file in BBEdit and set the encoding to Windows Latin-1. Enter the text ˧ and save the file. Close it and reopen it in BBEdit. It will now say UTF-8 and show a different content. HTH Koen > Op 10 jan. 2020, om 22:58 heeft Two Way Communications via 4D_Tech > <[email protected]> het volgende geschreven: > > If, e.g., I look at a pdf file in BBEdit, it says ‘Mac Roman’. -------------------- Compass bvba Koen Van Hooreweghe Kloosterstraat 65 9910 Aalter Belgium tel +32 495 511.653 ********************************************************************** 4D Internet Users Group (4D iNUG) Archive: http://lists.4d.com/archives.html Options: https://lists.4d.com/mailman/options/4d_tech Unsub: mailto:[email protected] **********************************************************************

