> I explained there is no such thing

Of course there is backward compatibility in many 8-bit encodings. An 
ASCII-encoded file can be opened and treated as ISO-8859-1 for example (well, 
maybe not so easily in Geany...) since the former is a subset of the latter.

> "no specified encoding" might be a good label for the setting

Maybe in the `Set Encoding` submenu, but as you already mentioned, it is still 
misleading in the Preferences. Since you described it as _Don't try anything 
first, just search_, could it simply be named `Auto-detect`, `Auto-search` or 
something like that?

Also, doesn't the `Without encoding (None)` setting effectively nullify its 
parent `Use fixed encoding when opening non-Unicode files` and if so, is 
actually redundant? 

> the file happened to be a valid encoding

Autodetection is particularly sensitive to the first few bytes, but not in 
every case. I still think there's something wrong with encoding detection in 
Geany, setting the 8-bit values aside. I hoped OP examples would be convincing 
enough, but I'll provide more relevant examples when I encounter that again. 

> The encoding detection is not buggy, the files are

Oh, come on. As if Geany was a sticky notes application, not a versatile 
developer's tool. I agreed in the very beginning that hex editor was the 
correct generic answer, but the question remains whether Geany is really going 
to refuse opening hybrid files that it potentially could. Although it is 
apparent now that some additional hackery would be needed for that.

> the solution would be to fix the file

Of course and I'm doing that while also outputting an ordinary log file in the 
very same program. But this is a case of a debugging log with raw 8-bit values 
and 99.9% of content being ASCII and I was interested in the actual values 
above 128.

> the Geany buffer has to be valid UTF-8 with no embedded NULLs

I understand the NULL value limitation and even then I think the user should be 
actively notified and also given a chance to load file up to the first NULL 
occurrence, with a red data-loss warning (and possibly a `Save As...` shortcut) 
included of course. But not having a chance at all is not really a solution.

> the input must be valid in UTF-8

Setting the NULL limitation aside, we are talking about values 1-255, which are 
valid UTF-8. So theoretically any nonzero uint8_t data can be represented in 
such a buffer or am I mistaken somewhere?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/2910#issuecomment-932823912

Reply via email to