Hi Eli, I'm relieved to say that I've found a workaround for the current situation. It's another ad-hoc solution, but I was facing a big mess, and this was an easy solution.
I've now been able to state more specifically what's going on. The characters in question are ordinary 8-bit extended ascii characters, like the European vowels with accents, or the 8-bit equivalents of the single and double quotes. These characters come from ordinary web pages or PDF files or WORD documents. They're fairly standard in English-language media, and of course also in foreign language media. You suggested that I use the "raw-text" coding system, implying that these characters are random binary data. But they're actually completely valid 8-bit characters that are commonly used in Western media. I'm beginning to remember now why ten years ago I set up the default coding system to be "windows-1252-dos." This was the coding system most often used to display web pages in IE and Firefox. This coding system is standard because it displays all the characters in web pages from North American and European web sites correctly. Since I wanted exactly the same thing in the editor, I used the same coding system in emacs. And this works very well in emacs. The 8-bit characters are displayed exactly the way they should be. Furthermore, saving and reloading the text file preserves the 8-bit characters, so all is well. The exception is when emacs loads a large Windows text file containing sufficiently many 8-bit European characters, and emacs goes through its sampling algorithm and unilaterally declares it to be a Unix file. This is the nightmare scenario I've been talking about, and it's typically a disaster. Emacs does something to every 8-bit character so that it displays incorrectly, using that octal format, creating a huge mess. Furthermore, ordinary commands stop working. For example, forward-paragraph no longer works, because ^M is no longer recognized as an end of line character. So the net result is that emacs loads a Windows text file on a Windows system, decides that it's really a Unix file (which it isn't), and then really damages the file in a way that's almost impossible to recover from. Eli, this is not something that an editor should be doing gratuituously. So anyway, as I said, I found an ad-hoc workaround. I have this very large text file that's in this damaged state, and I was dreading having to go through and fix it character by character, and that's what motivated my original message. So the ad-hoc workaround is this: * Open the file in Notepad. All the 8-bit characters are displayed correctly. * Select and copy the entire text in Notepad. * In emacs, open a new text file. * Paste the text that you copied from Notepad. * Save the result. Much to my relief, this cures all the 8-bit problems, and I can go back to reloading and editing the file in emacs. I have a few additional notes: Note 1: You asked me to select the problem characters, and type "C-x=". After going through the workaround, I can now look at "before" and "after" versions of the same text in two different files and buffers. So I select the character é (e with an acute accent, as in the first letter of the French spelling of the word elite). Here is the information that "C-x=" provides in each of the two cases, the damaged and repaired file respectively: Char: \351 (4194281, #o17777751, #x3fffe9, raw-byte) point=76501 of 343691 (22%) column=51 Char: é (233, #o351, #xe9, file #xE9) point=74734 of 336596 (22%) column=51 Note 2: As an additional experiment, I open the repaired file in "emacs-Q". It comes up with a coding system of "raw-text-dos," and it displays the above character as "\351", but without declaring it to be a Unix file. If I use "C-x=" on the same character, I get the following: Char: \351 (4194281, #o17777751, #x3fffe9, raw-byte) point=74734 of 336596 (22%) column=51 Note 3: You asked what software I'm running: OS: Windows 7 Professional Editor: GNU Emacs 25.1.1 (i686-w64-mingw32) WP: Microsoft Word 2003 and 2013 Browser: Firefox Quantum 62.0 (64-bit) So I hope that information is helpful. I'm really relieved that I found this latest ad-hoc workaround, but if there's any way to provide an option so that I can completely suppress that Unix identification algorithm, I would really appreciate it, and I suspect that I'm not the only one. Thanks. John