Hi Stephen,
> In the documents you serve, do you specify the encoding *within* the
> document, at th etop of the HTML file for example? Or are you serving
> XML in which case the default for that is utf-8 anyway (I think, off
> the top of my head...).
Usually we specify it both ways, in the
meta http-equiv
part of the HTML header and the
Content-Type
header of the HTTP response.
> Another possibility is that you happen to be using browsers which are
> smart enough to reparse a document if it doesn't happen to be in the
> encoding it first expected. I think the big guys do this -- not sure
> your mobile phone will be so forgiving.
I'd say we are perfectly happy with just setting up the config file via the
ns/encodings + ns/mimetypes sections and let the server handle the rest. The
less knobs the better. We know (or can control) the encoding of files on
disk, we set up the encoding of the database - and then we simply want to
return the specified encoding. We have different sites running with
iso-8859-1, -15 and utf-8.
Usually we have no need to do runtime changes, but if so, I would like to see
ns_conn to do the expected thing.
Only relying on (aka. being forced to use) UTF-8 would not be optimal as a
potential naviserver user might want to use another specified encoding or
avoid a UTF/unicode database setup for whatever reason, e.g. performance,
storage or to avoid collation issues (sorting orders).
For us using only web and http moving with every installation to UTF-8 is
nevertheless the way to go.
> (This applies to case 3: supporting multiple encodings)
>
>
> I agree with Zoran. ns_conn encoding should be the way to change the
> encoding (input or output) at runtime.
yes.
> Another place this trips up: In the config for the tests Michael added:
>
> ns_section "ns/mimetypes"
> ns_param .utf2utf_adp "text/plain; charset=utf-8"
> ns_param .iso2iso_adp "text/plain; charset=iso-8859-1"
>
> ns_section "ns/encodings"
> ns_param .utf2utf_adp "utf-8"
> ns_param .iso2iso_adp "iso-8859-1"
>
> The ns/encodings are the encoding to use to read an ADP file from
> disk, accoring to extension. It solves the problem: the web designers
> editor doesn't support utf-8.
If you focus here only on web designers and adp files. It could be every other
kind of usage as well (file exports etc.).
> But, the code is actually expecting Tcl encoding names here, not a
> charset, so this config is busted. It doesn't show up in the tests
> because the only alternative encoding we're using is iso-8859-1, which
> also happens to be the default.
this is correct, an annoying thing to be aware of.
> The strategy of driving the encoding from the mime-type has some other
> problems. You have to create a whole bunch of fake mime-types /
> extension mappings just to support multiple encodings (the
> ns/mimetypes above).
>
> What if there is no extension? Or you want to keep the .adp (or
> whatever) extension, but serve content in different encodings from
> different parts of the URL tree? Currently you have to put code in
> each ADP to set the mime-type (which is always the same) explicitly,
> to set the charset as a side effect.
this is true. it does not affect our apps, as we commit to one encoding and
then cache the HTML output to files on disk, but it is not nice if you have
the need to change it.
> * utf-8 by default
> * mime-types are just mime-types
> * always hack the mime-type for text data to add the charset
> * text is anything sent via Ns_ConnReturnCharData()
> * binary is a Tcl bytearray object
> * static files are served as-is, text or binary
> * multiple encodings are handled via calling ns_conn encoding
> * folks need to do this manually. no more file extension magic
> I think a nice way for folks to handle multiple encodings is to
> register a filter, which you can of course use to simulate the file
> extension scheme in place now, the AOLserver 4.5 ns_register_encoding
> stuff, and more, because it's a filter. You can also do things like
> check query data or cookies for the charset to use.
As our app has one main filter that handles the file dispatching we simply
would place it there.
But we should find a solution that is both flexible and compatible in respect
of the "file extension magic", if possible!
> Questions that need answered:
>
> * can we junk charset aliases in nsd/encodings.c and use a dir of symlinks?
i would vote for non filesystem based lookup function.
> * can we junk ns/encodings in 2006?
i would not recommend it as the server loses purposes.
Bernd.