On 11 June 2014 15:01, Kevin Brubeck Unhammer <[email protected]> wrote:
> Francis Tyers <[email protected]> writes:
>
>> El dt 25 de 03 de 2014 a les 12:17 +0000, en/na Jim O'Regan va escriure:
>
> [...]
>
>>> Also, I have a tiny feature that allows the user to specify a set of
>>> characters to be ignored at runtime (motivated primarily by soft
>>> hyphens, but I've left it general[1]). I sent the patch to Sergio to
>>> review, but I'd really rather get it in now than wait n years until
>>> the next release :)
>>>
>>> For the curious, I've attached the patch.
>>>
>>> Current behaviour is:
>>> $ echo testing |lttoolbox/lt-proc
>>> ~/Apertium/apertium-en-es/en-es.automorf.bin
>>> ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^ing/*ing
>>>
>>> Using this as soft-hyphen.icx:
>>>
>>> <?xml version="1.0"?>
>>> <ignored-chars>
>>> <char value="­ "/>
>>> </ignored-chars>
>>>
>>> echo testing |lttoolbox/lt-proc -i soft-hyphen.icx
>>> ~/Apertium/apertium-en-es/en-es.automorf.bin
>>> ^testing/test<vblex><ger>/test<vblex><pprs>/test<vblex><subs>/testing<n><sg>$
>>
>> Could this just be included as default ? I mean, are there any cases in
>> which we would not want to skip a soft-hyphen ?
>
> So having an icx on the command line is nice for developers, and people
> who use lt-proc for non-Apertium things. But it would require changing
> modes files for any pairs that want to take advantage of it
BFD. That just needs something like this:
Index: modes2bash.xsl
===================================================================
--- modes2bash.xsl (revision 50128)
+++ modes2bash.xsl (working copy)
@@ -23,6 +23,7 @@
<xsl:param name="prefix"/>
<xsl:param name="dataprefix"/>
+<xsl:param name="autoicx"/>
<xsl:template match="modes">
<xsl:for-each select="./mode">
@@ -45,6 +46,16 @@
</xsl:template>
<xsl:template match="program">
+ <xsl:variable name="prog">
+ <xsl:choose>
+ <xsl:when test="$autoicx = 'yes' and ./@name = 'lt-proc' and
contains(./file/@name, 'automorf')">
+ <xsl:value-of select="concat('lt-proc -i ', $dataprefix,
'/default.icx')"/>
+ </xsl:when>
+ <xsl:otherwise>
+ <xsl:value-of select="./@name"/>
+ </xsl:otherwise>
+ </xsl:choose>
+ </xsl:variable>
<xsl:choose>
<xsl:when test="@prefix">
<xsl:value-of select="@prefix"/>
@@ -51,7 +62,7 @@
<xsl:value-of select="string('/')"/>
</xsl:when>
</xsl:choose>
- <xsl:value-of select="./@name"/>
+ <xsl:value-of select="$prog"/>
<xsl:for-each select="./*">
<xsl:value-of select="string(' ')"/>
<xsl:apply-templates select="."/>
...and from there it's a small change to add a ' $1' and a small
change to the driver script to allow user icx files.
> … I think
> maybe a hardcoded ignore-list in lttoolbox would be more helpful to more
> users. Are there other use-cases than soft-hyphens?
Yes. My use case was not soft hyphens, I just applied it to them. I
can't say what that was though - if you'd asked 3 months ago I could
have told you, but I don't remember now. I do remember that it was
quite specific to an individual text, which is why I favoured a
solution that could be tailored.
> Or cases where we
> want to _not_ ignore the soft-hyphen?
No idea. Hardcoding it and waiting for someone to complain is
certainly a way to find out, but not exactly optimal.
>
> (Tino Didriksen noted some other possibly skippable stuff:
> http://www.fileformat.info/info/unicode/category/Cf/list.htm )
--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff