#2431: ffmpeg subtitle encoding of special characters does not working correctly -------------------------------------+------------------------------------- Reporter: Nick | Owner: Type: defect | Status: new Priority: normal | Component: Version: git-master | undetermined Keywords: sub srt | Resolution: Blocking: | Blocked By: Analyzed by developer: 0 | Reproduced by developer: 0 -------------------------------------+-------------------------------------
Comment (by Nick): You are right, the presence of the UTF-8 BOM is optional but here are different software tools which can detect the right encoding type (meaning ANSI text, UTF-8 with BOM or UTF-8 without BOM but not the code page). I tested MP4Box with *.srt files in ANSI, UTF-8 and UTF-8 w/o BOM. MP4Box seems to detect the encoding type and create in all three cases the same result! It is possible! Another example is the open source tool Notepad++, it can also detect the encoding type. Maybe you can find in source code of such tools methods to detect the right encoding type. ISO-8859-1 and CP-1252 are not exactly the same but the used special characters in my "subtitle_test.srt" are the same in both! Therefore the little comment in my srt file ;-) ... ''"These are printable characters of ISO-8859-1: (*str >= 32 && *str < 128) II (*str >= 160 && *str <= 255)"'' ... for this range it is exactly the same. For the most European Languages like French, German, Italian, Spanish and more it is enough to use as default CP-1252 or ISO-8859-1. '''More important for the imported subtitle file is the question: "Is it plain text or is it already UTF-8?"''' [[BR]] My proposal to select a default code page for every subtitle stream: - If no language is defined for the subtitle stream or the language is unknown: [[BR]] --> use CP-1252 as default (or ISO-8859-1) - If a language is defined (e.g. with '''-metadata:s:s:0 language=ger'''): [[BR]] --> use a selection table to set automatically a code page - If a dedicated code page is selected by an option like "''-sub_charenc''": [[BR]] --> use that setting instead of the other ones -- Ticket URL: <https://ffmpeg.org/trac/ffmpeg/ticket/2431#comment:8> FFmpeg <http://ffmpeg.org> FFmpeg issue tracker _______________________________________________ FFmpeg-trac mailing list FFmpeg-trac@avcodec.org http://avcodec.org/mailman/listinfo/ffmpeg-trac