Yeah, Fran, there are at least hundreds (maybe thousands) of them in the corpus...
I don't know how and where to file and issue for that. I was gonna to create an issue in apertium-tat, but it is a global thing, not just for Tatar... Am Di., 13. Nov. 2018 um 20:09 Uhr schrieb Francis Tyers < fty...@prompsit.com>: > El 2018-11-13 16:29, Kevin Brubeck Unhammer escribió: > > mansur <6688...@gmail.com> čálii: > > > >> Hello! > >> > >> There are so many symbols that are not recognized by Apertium's tagger > >> and > >> not marked in any way. For example, apertium-tat does not recognize > >> the > >> following symbols: > >> _ @ % ~ | > >> and many others. > >> > >> Is it possible to use some special tag (^_/_<unknown>$) for such > >> cases? > > > > Yes, just give them analyses in tat.dix, e.g.: > > > > <e><re>[_@%~|]</re><p><l/><r><s n="symb"/></r></p></e> > > > > (untested) > > I generally use <sym> for that, but there are a lot of Unicode symbols > and it's impossible to list them all in the .dix file, there should be > some kind of builtin for that I think. > > Fran > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff