On 2015-05-13 07:43, Takatsugu Nokubi wrote: > "ー" (prolonged sound mark) is a popular symbol in Japanese. > And the rule itself is simple: > > The symbol is placed after Hiragana or Katakana, not Kanji.
If the scripts (Hiragana, Katakana, Kanji) have non-overlapping Unicode ranges, it should be possible to use a regular expression like this: Hiranga: [\u3040-\u309F] not Hiranga: [^\u3040-\u309F] So if you want to find character "X" after Hiranga you could try this: <token regexp="yes">.*[\u3040-\u309F]X</token> Or maybe, depending on tokenization: <token regexp="yes">[\u3040-\u309F]+</token> <token>X</token> Regards Daniel ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel