A 2014-06-11 16:01, Kevin Brubeck Unhammer escrigué: > Francis Tyers <[email protected]> writes: > >> El dt 25 de 03 de 2014 a les 12:17 +0000, en/na Jim O'Regan va >> escriure: > > [...] > >>> Also, I have a tiny feature that allows the user to specify a set of >>> characters to be ignored at runtime (motivated primarily by soft >>> hyphens, but I've left it general[1]). I sent the patch to Sergio to >>> review, but I'd really rather get it in now than wait n years until >>> the next release :) >>> >>> For the curious, I've attached the patch. >>> >>> Current behaviour is: >>> $ echo testing |lttoolbox/lt-proc >>> ~/Apertium/apertium-en-es/en-es.automorf.bin >>> ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^ing/*ing >>> >>> Using this as soft-hyphen.icx: >>> >>> <?xml version="1.0"?> >>> <ignored-chars> >>> <char value="­ "/> >>> </ignored-chars> >>> >>> echo testing |lttoolbox/lt-proc -i soft-hyphen.icx >>> ~/Apertium/apertium-en-es/en-es.automorf.bin >>> ^testing/test<vblex><ger>/test<vblex><pprs>/test<vblex><subs>/testing<n><sg>$ >> >> Could this just be included as default ? I mean, are there any cases >> in >> which we would not want to skip a soft-hyphen ? > > So having an icx on the command line is nice for developers, and people > who use lt-proc for non-Apertium things. But it would require changing > modes files for any pairs that want to take advantage of it … I think > maybe a hardcoded ignore-list in lttoolbox would be more helpful to > more > users. Are there other use-cases than soft-hyphens? Or cases where we > want to _not_ ignore the soft-hyphen? > > (Tino Didriksen noted some other possibly skippable stuff: > http://www.fileformat.info/info/unicode/category/Cf/list.htm )
I would say in the first version just skip soft hyphen. We can release minor releases if something else comes up. F. ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
