Re: discovering language tool

Marcin Miłkowski Sat, 13 Dec 2014 04:11:31 -0800

Hi,

W dniu 2014-12-12 o 18:24, Elie Naulleau pisze:
> Hi all,
>
> I am just discovering LT and I am getting interested in its possibilities.
>
> I have been auditing/evaluating a correction software for a company
> looking for style correction.
> It is called LELIE, is based on the Dislog language, a layer on top of
> Prolog (Commons licence).
> It is a more powerful approach than LT but it has its drawbacks
> (complexity, maintenance cost, need formal training to maintain, logic
> programming in Prolog, lexicon, rules, reasonning, everything is in
> Prolog, etc.
> http://www.irit.fr/~Patrick.Saint-Dizier/publi_fichier/manuelV1.pdf )
> Linguistically, it relies on rethorical structures (RST,
> http://www.sfu.ca/rst/01intro/intro.html )
> It is able to recognize semantic function like circumstance, concession,
> condition, evaulation, etc.
> Its performance in term of speed are not spectacular (deep parsing,
> Prolog backtracking) but  it is usable.
> Some publications in case you are curious:
> http://www.irit.fr/recherches/ILPL/lelie/accueil.html
> http://dl.acm.org/citation.cfm?id=2388653
> http://anthology.aclweb.org/C/C14/C14-2006.pdf
> https://liris.cnrs.fr/inforsid/sites/default/files/2012_6_1-PatrickSaint-Dizier.pdf
>
>
> The reason for this email is that I am looking for an alternative.
>
> I would like to be able to answer to the following questions :
>
> - Is LT able to recognize complex structures, such as passive form,
> structures with gap in the middle (I assume so since it seems able to do
> regex on patterns of part of speech)


Yes, to some extent. We can define discontinuous patterns (with the help 
of skipping).

> - Is LT able to take into account a provded SKOS (or similar) thesaurus
> in order to pre-recognized multi-word terms

No, but we have some support for tagging multi-word terms. It should be 
quite easy to add another layer of annotation if it's needed.

> - How LT does part of speech tagging (ML models, other approach,
> TreeTagger, etc ?).

By using a morphosyntactic lexicon and manually created disambiguation 
rules. It uses statistical models for Chinese and Japanese.

> Is it conceivable to plug in one’s POS tagger (for
> instance Stanford NLP Tools tagger) ?

It is but we don't recommend it. These taggers assume grammaticality, 
and they don't show the actual wrong POS tags but the ones that should 
be there. So I really prefer writing rules manually, as they can be 
easily changed.

> - Is it easly extensible ? (rule templates for new form of error
> recognition, complex syntactic patterns that would require their own
> implementation)

I think so.

> - Can it cope with structure information (xml tags). Here is an example
> : enumerations. One could say that all items of an enumeration should
> begin with the same form (infinitive verb, or noun, whatever). To verify
> this, the structure of the document mus be taken in to account. If the
> document is available in XML with sutructure information, it is
> conceivable for LT to process such a document (does its architecture
> allows this, if it not possible yet).

Not possible yet as we don't have this layer of information. But in 
principle, it should be easy to add. Our problem was that it's hard to 
have a self-documenting example that checks if it works (we have 
examples for regression testing and for documentation; adding any 
styling or enumeration in pure text is difficult).

But this is not rocket science: probably we can have additional style 
annotations for examples.

>
> Another topic :
>
> Do you know BlackLab (based on Lucene)
> https://github.com/INL/BlackLab/wiki/Features ?
> It can look for patterns (like LT rules) in very large amount of texts
> (thanks to Lucene) and get almost immediate answers.
> It can process annotated text (part of speech, up to 10 levels of more
> type of linguistic information, semantic, tonalities, etc).
> I have been playing with it and I think it could be of a good help to do
> statistics on syntatctic patterns from large corpus, in order may be, to
> infer correction rules froma corpus of uncorrect sentences.

We use Lucene for regression checks on wikipedia and large corpora.

Best regards,
Marcin

>
>
> Sorry I have not yet read the full LT documentation but I thought I
> could save some time submitting a question on the dev mailing list,
>
> Thank you,
>
> Cheers,
> Elie Naulleau
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: discovering language tool

Reply via email to