[apologies if this isn't the right list; please redirect if that's the case]
I've started toying with adding JavaScript support to idutils. The JavaScript grammar is defined in terms of a stream of UTF-16 code units (not, unfortunately, in terms of Unicode code points), and JS identifiers can contain non-ASCII characters. What kind of 'struct token' should I return for that? Is there a defined encoding for non-ASCII characters in the ID database? If we elect to use UTF-8 in ID databases, then we'll need to depend on something like iconv to convert to and from the locale's current encoding --- assuming that the files read are using that. If we elect to use the locale's coded character set in ID databases, then interpreting a database's contents correctly will depend on the coded character set being the same as it was when the database was created, which seems unfortunate. The JavaScript scanner would still need to use iconv to get the UTF-16 stream it needs, so this approach won't avoid introducing a dependency on iconv. For now, I'm going to punt on non-ASCII characters, treating them all as identifier components. _______________________________________________ bug-idutils mailing list bug-idutils@gnu.org https://lists.gnu.org/mailman/listinfo/bug-idutils