On Fri, Feb 28, 2014 at 6:28 PM, Jimmy O'Regan <[email protected]> wrote: > On 28 February 2014 18:21, Alex Aruj <[email protected]> wrote: >> Hi group, >> ... >> Is the priority to make the charlifter case-sensitive and for it to respect >> superblanks exactly as in the example in the box laid out here >> http://wiki.apertium.org/wiki/Superblanks? >> > > Respecting superblanks is a must: diacritic restoration must not be > applied to them. > > Case should definitely be _respected_: the output needs to match the > input in terms of case. > > As for case sensitivity, Kevin Scannell is the person to ask for a > definitive answer. My feeling is that case sensitivity can > potentially be more accurate, but in the absence of sufficient data, > case insensitive (trained on lowercase) should be the default. >
This is spot on. You'll do better in most cases with case sensitive models (e.g. for Jimmy: Irish "Éire" vs. "eire") unless there is very limited training data. For individual cases, you can always try both and see which performs better. Kevin ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
