[bug-idutils] JavaScript support and non-ASCII identifiers

Jim Blandy Thu, 13 Sep 2012 18:59:03 -0700

[apologies if this isn't the right list; please redirect if that's the case]


I've started toying with adding JavaScript support to idutils. The
JavaScript grammar is defined in terms of a stream of UTF-16 code
units (not, unfortunately, in terms of Unicode code points), and JS
identifiers can contain non-ASCII characters. What kind of 'struct
token' should I return for that? Is there a defined encoding for
non-ASCII characters in the ID database?

If we elect to use UTF-8 in ID databases, then we'll need to depend on
something like iconv to convert to and from the locale's current
encoding --- assuming that the files read are using that.

If we elect to use the locale's coded character set in ID databases,
then interpreting a database's contents correctly will depend on the
coded character set being the same as it was when the database was
created, which seems unfortunate. The JavaScript scanner would still
need to use iconv to get the UTF-16 stream it needs, so this approach
won't avoid introducing a dependency on iconv.

For now, I'm going to punt on non-ASCII characters, treating them all
as identifier components.

_______________________________________________
bug-idutils mailing list
bug-idutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-idutils

[bug-idutils] JavaScript support and non-ASCII identifiers

Reply via email to