Well… what can we do?  No, we don't support files with multiple encodings in 
them (and I'm not aware of any tool handling that either).  How would you 
suggest Geany treat this file?

Magically recognizing which chunks are encoded in which encoding is not really 
viable, because detecting encodings is virtually impossible but for a few 
selected encodings (like UTF-8, but then again it can be opened just fine as 
e.g. CP1251, it's valid, at worse a bit odd).
The rocket scientists of this area are using statistics of most likely 
character occurrences to try and make the best choice, but here again it's 
purely statistical, and can easily be wrong (imagine a file in ISO-8859-1 with 
only `œ` in it, it's not statistically likely yet totally valid).  The best 
solution remains letting the user choose.  Or using a mostly unambiguous 
encoding, like UTF-8 :)

In the case of a diff file, we *could* probably either try and look at each 
file on disc and guess its encoding (if we can find it, unlikely as it's not an 
absolute path, and there's no guarantee the user viewing the file has the 
repository on his machine), or maybe "simply" recognize chunks in diffs and 
guess each separately.
This however presents several problems
+ requires special handling of diff files at the loading level
+ requires *very* special handling of diff files at the save level, to be able 
to do the proper conversion in the opposite direction.  This part would be 
especially tricky.
+ requires parsing the file even before converting it, which might or might not 
be a problem (assuming all encodings are ASCII-compatible, and diff only uses 
ASCII control characters, it should be doable)
+ doing all this leads to even more encoding guessing than what currently 
happens, leading to even more room for choosing the wrong one at some point; 
and it makes user override a lot harder.



---
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/873#issuecomment-172532037

Reply via email to