Hi,

currently, TSX rules apply only to two-category sequences, with 
categories defined in the TSX file. These two-category sequences are 
either forbidden or enforced. This is much less than what cg-proc can 
do. What I would like to see is a proper lightning-fast finite state 
implementation of VISL CG3 (or at least, of a large subset of the kind 
of rules supported by VISL CG3). Is there a GSoC idea to finish this up? 
Who would be able to work on it?

Cheers

Mikel

El 07/03/16 a les 07:39, Per Tunedal ha escrit:
> Hi,
> Just a thought: couldn't this kind of rule just as well be implemented
> in the TSX-file that's used to train the tagger? In that case,
> retraining the tagger might do the trick as well.
> Yours,
> Per Tunedal
>
> On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote:
>> Per Tunedal <[email protected]> čálii:
>>
>>> 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the
>>> indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the
>>> cow). We have:
>>>
>>>   (kon→ kon<n>/ko<n>)
>>>
>>> Translating the whole sentence would give us:
>>>
>>> tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the
>>> cow)
>>>
>>> Wouldn't that be quite revealing in many cases? In this case e.g. a
>>> statistical language model could easily separate the wheat from the
>>> chaff.
>> That example argues against your point – here the source language has
>> two analyses of "kon", with different ind/def taggings (as it should).
>>
>> This is not a lexical selection problem, but a morphological
>> disambiguation problem.
>>
>> It took me all of five minutes to write a CG rule to select indefinite
>> for nouns after indefinite determiners:
>>
>> LIST IndA = (adj ind) (adj comp) ;
>> SET NotIndA = (*) - IndA ;
>> REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA)
>> ;
>>
>> and a quick corpus diff seems to show it generalises well:
>>
>> http://sprunge.us/hhbf?diff
>>
>> -- 
>> Kevin Brubeck Unhammer
>>
>> GPG: 0x766AC60C
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> Email had 1 attachment:
>> + signature.asc
>>    1k (application/pgp-signature)
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://makebettercode.com/inteldaal-eval
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-- 
  Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to