"Bernard Chardonneau" <[email protected]> writes: >> Date: Sun, 22 Jul 2012 17:01:30 +0200 >> From: Jacob Nordfalk <[email protected]> >> To: [email protected] >> Reply-To: [email protected] >> Subject: Re: [Apertium-stuff] New applications: Apertium Caffeine >> andApertium plug-in for OmegaT >> >> (..................) >> >> ======= >> >> >> WRT formatters and deformatters I think its fine to make a simple >> (de)formatter like the one needed for omegaT or for HTML, if you anticipate >> they are needed for plugins. More advanced (de)formatters is for the C++ >> version, which has a sophisticated (some would say complicated :-) way of >> (de)formatting which I *don't* recommend you to look into. >> >> But you could play with the C++ version to get a feel of it. For example: >> >> $ echo "I am <b>fine</> and all. :-)" | apertium-deshtml >> >> I am[ <b>]fine[<\/b> ]and all. :-).[][ >> ] >> >> >> $ echo "I am <b>fine</b> and all. :-)" | apertium-deshtml | >> apertium-rehtml >> I am <b>fine</b> and all. :-) >> >> Stephen Tigener worked with the text formatter. Probably, if he have time, >> he could quickly put something together. >> >> >> Jacob >> >> -- > > As to me, I think since several months to write formatters and deformatters > for man pages and also for mnémonic interface files (on each line, an > identifier not to be changed on the left side and a printf formatted string > on the right side).
Are these online somewhere? > I recently watched destxt source file to see the programming style but I > felt surprised to see a so big source just to do what it is supposed to do. > My source files would be really shorter. http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium/apertium/txt-format.xml is the source for the txt format, not very long. But it's "compiled" to cpp files by a combination of some xslt scripts and flex. The same scripts are used for html and some other formats, but of course it's not that easy for more complicated formats like mediawiki (see https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-mediawiki/). > If it is not considered as a crime to use non UTF-8 texts for translations I believe it may be in some jurisdictions, but IANAL. > I would think also nice to add the suffix iso to the name of the format. So, > the options -f txtiso , -f htmliso, -f maniso and -f mnemoiso (some more ?) > would start converting to UTF-8 a ISO-8859-1 (or close ISO-8859-n variants) > input file before formatting, translating, deformatting and putting again with > ISO-8859-1 charset (assuming source and target languages both use west > european > alphabet). That sounds to me like trying to make Apertium do more than it should … (also that would be --encoding, which is unrelated to --format). Why not just do iconv -f latin1 -t utf8 | apertium from-to | iconv -f utf8 -t latin1 If you do that a lot, you could make a bash function: apertium-l1 () { iconv -f latin1 -t utf8 | apertium $@ | iconv -f utf8 -t latin1 } and then just call apertium for u8 input/output, apertium-l1 for latin1 input/output. -- Kevin Brubeck Unhammer GPG: 0x766AC60C ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
