On 3/29/2017 1:05 PM, The Tick wrote:
On 3/29/2017 2:36 PM, Richard Hipp wrote:
Most of the world is using UTF-8 now.

I'm wondering how that can be for programming language source files.

I managed to put the "bom" in front of a one-line tcl script:

puts "This is a copyright symbol: ©."

where the '©' was previously converted to utf-8 by fossil.

gvim now reads the file and renders the utf-8 '©' as a '©'
notepad displays the file and renders the utf-8 '©' as '©'

If a BOM encoded in UTF-8 is present, that unambiguously marks the text as UTF-8. But. As you note that is not always compatible with other uses of the file. As UTF-8 was designed to be highly compatible with ASCII, including the BOM is not usually recommended unless it is required for other reasons.

VIM seems to default out of the box to Latin1 encoding which is more consistent with Windows. (More correctly, it defaults to an encoding consistent with the current Locale, which on Windows is usually Latin1 or another 8-bit codepage. Windows does support a UTF-8 codepage (aka 65001) but I've never seen that set as the system default.)

You can (probably) change it to support UTF-8, but it seems to make that task as difficult as possible for a novice to the weird and subtle world of file encoding issues. My copy of VIM 8 on Win 10 Pro correctly reads UCS-2 (16-bit Unicode) files with BOMs, but proudly converts them to Latin1 for display and editing which would clearly be a bad idea if they had included characters from outside the coverage of the Latin1 codepage. Copyright and a number of other non-ASCII but otherwise ordinary symbols are included in Latin1 and work as expected.

From my reading of the help file mbyte.txt, especially Section 10 Using UTF-8, you want to :set encoding utf-8 before reading the file. Your .vimrc might be a good place to do that. Another place to do that is to use a modeline in your .tcl file that tells vim to assume UTF-8. Something like

    # vim: set enc=utf-8 fenc=utf-8

"near" the top or bottom of the file should do the trick.

The other huge caveat is that you also need to have fonts configured that cover enough Unicode Codepoints to be useful to you. I believe VIM defaults to "fixedsys" on Windows which is not a Unicode font. You will want to change to Lucida Console at least, if not to something even more programmer-friendly such as Hack[1], Source Code Pro[2], or DejaVu Sans Mono[3] with good Unicode coverage and other features useful to coding without eyestrain.

[1]: http://sourcefoundry.org/hack/
[2]: http://www.adobe.com/products/type/fonts-by-adobe.html
[3]: http://dejavu-fonts.org/

You may also want your console windows to understand UTF-8. If you have the console set to use an appropriate font, (I personally use Hack for both my consoles and my editors) then all you need to do is CHCP 65001 at the CMD prompt to switch to the UTF-8 codepage.


>but<

$ /c/Program\ Files/tcl/bin/tclsh u.tcl
invalid command name "puts"
    while executing
"puts "This is a copyright symbol: ©.""
    (file "k.tcl" line 1)

While adding the option "-encoding utf-8" to the tclsh command line makes it work, it does not work when I double-click on the .tcl file as I have no way to set any sort of encoding option -- unless I have to make a windows shortcut for each and every .tcl file that I want to run and put the -encoding there.

There is a way to do this automatically. Windows uses registry keys to associate a file extension with a logical file type, and a file type with the command to "open" it. The assoc and ftype commands provide a simpler interface to viewing and setting the needed registry keys. For ActiveTcl, I have:

    C:...>assoc .tcl
    .tcl=ActiveTclScript

    C:...>ftype ActiveTclScript
    ActiveTclScript="C:\Programs\Tcl\bin\wish86.exe" "%1" %*

You can change the definition of ActiveTclScript to include -encoding utf-8

Note that would make your installation less consistent with the rest of the users, and is thus likely not the first choice for addressing this issue.

So, how can one use a program source file encoded in utf-8?

--
Ross Berteig                               [email protected]
Cheshire Engineering Corp.           http://www.CheshireEng.com/
+1 626 303 1602

_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to