Hi.
Has anyone here faced or solved this problem before? I a novice with
Unicode, much less human languages.
I have created a very simple DSL with antlr for pseudo natural language.
Nothing special.
It currently recognizes the usual identifiers:
ID : ('a'..'z'|'A'..'Z'|'_')
('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
and now I have a Japanese user who wants both identifiers and keywords
localized in his native language. The platform is .NET or Mono, and the
input stream supports UTF-16. I'd like to solve the problem once for all
languages and not just Japanese and English.
Localizing the keywords should be simple enough. They're different but
fixed for each language.
The identifiers are tricker. I need to exclude any members of the
(localized) whitespace or ordinal number sets. So at first I thought
ID : ~( WS | DIGIT | KSEP )+ ~( WS |
KSEP )*
WS : <localized list of whitespace
codepoints>
DIGIT : <localized list of ordinal number
codepoints>
KSEP: <localized thousands separator>
This turns out to be very naive, and I see this getting ugly fast. Already
I have to localize the DSL keywords so there's no way around writing
multiple lexers. So far I have only two languages: English and Japanese.
But if this catches on, other users will want their own. I'd like to
minimize the number of lexers I need to maintain or at least maximize code
reuse between them.
I figure this question must come up for DSL's pretty regularly. Although we
more or less accept using a subset of Latin characters -- and usually just
ASCII -- for *general* purpose programming, the use case for DSL's almost
begs for localized identifiers and keywords. The users in this case or
ordinary business people, not programmers.
Any advice?
Thanks,
-CM
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address