Marcin Miłkowski <list-addr...@wp.pl> wrote:

W dniu 2014-04-29 07:02, Dominique Pellé pisze:
> > Daniel Naber <daniel.na...@languagetool.org
> > <mailto:daniel.na...@languagetool.org>> wrote:
> >
> >     On 2014-04-27 22:18, Dominique Pellé wrote:
> >
> >      > <token regexp="yes" postag_group1="foo">ez-(.*)</token>
> >
> >     I'm not sure how this could be implemented in a clean way... wouldn't
> >     this be a rather ugly special case in the tagger to ignore the
> >     tokenization and also split at the hyphen?
> >
> >
> >
> > I'm not sure either how it would be implemented not knowing
> > that code well enough. I have not tried to implement it. But I don't
> > think there should be a special case for the hyphen. My example
> > contains a hyphen, but hyphens should not be special.
> > The POS tag should rather be probed on the
> > pattern.matcher.group(1) or the regexp.
> >
> > It's not an ugly case. It's a useful general purpose feature, which
> > can avoid writing Java rules. Writing a Java rules is uglier.
> >
> > Another example where I could use it is for French conjugated
> > verbs in interrogations such as "Peux-tu"  (=can you),
> > "Peut-il" (=can he)... where the verb and the pronoun are in
> > the same token in interrogations (again with an hyphen in this
> > example).
> >
> > Right not, erroneous French conjugations such as *Peut-tu* are
> > not detected as an error by LanguageTool (false negative).
> > I could detect it as an error if I could do something more or less
> > like this:
> >
> > <pattern>
> >    <token regexp="yes" postag_group1="V.*"
> > postag_group1_regexp="yes">(.*)-tu
> >       <exception><token regexp="yes" postag_group1="V.* 2 .*"
> > postag_group1_regexp="yes">(.*)-tu</exception>
> >    </token>
> > </pattern>
> >
> > This would check that what matches (.*) in the token, is a
> > conjugated verb in the 2rd singular form (i.e. "V.* 2 .*").
> >
> > The French grammar checker "Grammalecte" based on Lightproof
> > correctly detects *peux-tu* as an error. Grammalecte or Lightproof do
> > not tokenize, so it's quite different than LanguageTool. Glancing at
> > Grammalecte rules (Grammalecte-0.3.9.1/fr-rules.txt), it detects the
> > error using such a rule:
> >
> > (\w+)-tu <- option("inte") and not morph(\1,
> > "po:(.pre|.imp|ipsi|ifut|cond).* po:2sg", False) and spell(\1) and not
> > re.match("(?i)vite$", \1)
> >      -1> _        # Forme interrogative. « \1 » n’est pas un verbe à la
> > deuxième personne du singulier.
>
> Well, why should we invent a new piece of XML machinery when we already
> have something similar with the <match> element? Basically, you want to
> search and replace the token surface form, and then tag it. I think we
> could simply adapt the syntax we already use for the synthesizer:
> <token postag="V.*" postag_regexp='yes'><match regexp_match="(.*)-tu"
> regexp_replace="$1" setpos="yes"/>[whatever you want here]</token>
>
> And this would simply apply the regexp replace and run the tagger on it.
>
> Note that this syntax is almost correct right now and LT won't complain
> about it,only weird things will happen, as it doesn't have any
> consistent semantics. Almost, because you need to say:
>
> <token postag="V.*" postag_regexp='yes'><match no="3"
> regexp_match="(.*)-tu" regexp_replace="$1" setpos="yes"/>[whatever you
> want here]</token>
>
> And @no attribute is required.
>
> So basically, the functionality is almost there and it would be fairly
> easy to add it via the reference setting in our code.
>

Glad to see that <match …/> could do that job without adding new
XML attributes. I did not think about using <match …/> because I have
not used the synthesizer yet in the languages I maintain. I hope that
the feature gets added. If it gets added, would that require creating
a synthesizer dictionary? There is no synthesize dictionary yet in the
languages that I maintain, but I can look at adding such a dictionary.

Regards
Dominique
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to