Hi, W dniu 2014-12-12 o 18:24, Elie Naulleau pisze: > Hi all, > > I am just discovering LT and I am getting interested in its possibilities. > > I have been auditing/evaluating a correction software for a company > looking for style correction. > It is called LELIE, is based on the Dislog language, a layer on top of > Prolog (Commons licence). > It is a more powerful approach than LT but it has its drawbacks > (complexity, maintenance cost, need formal training to maintain, logic > programming in Prolog, lexicon, rules, reasonning, everything is in > Prolog, etc. > http://www.irit.fr/~Patrick.Saint-Dizier/publi_fichier/manuelV1.pdf ) > Linguistically, it relies on rethorical structures (RST, > http://www.sfu.ca/rst/01intro/intro.html ) > It is able to recognize semantic function like circumstance, concession, > condition, evaulation, etc. > Its performance in term of speed are not spectacular (deep parsing, > Prolog backtracking) but it is usable. > Some publications in case you are curious: > http://www.irit.fr/recherches/ILPL/lelie/accueil.html > http://dl.acm.org/citation.cfm?id=2388653 > http://anthology.aclweb.org/C/C14/C14-2006.pdf > https://liris.cnrs.fr/inforsid/sites/default/files/2012_6_1-PatrickSaint-Dizier.pdf > > > The reason for this email is that I am looking for an alternative. > > I would like to be able to answer to the following questions : > > - Is LT able to recognize complex structures, such as passive form, > structures with gap in the middle (I assume so since it seems able to do > regex on patterns of part of speech)
Yes, to some extent. We can define discontinuous patterns (with the help of skipping). > - Is LT able to take into account a provded SKOS (or similar) thesaurus > in order to pre-recognized multi-word terms No, but we have some support for tagging multi-word terms. It should be quite easy to add another layer of annotation if it's needed. > - How LT does part of speech tagging (ML models, other approach, > TreeTagger, etc ?). By using a morphosyntactic lexicon and manually created disambiguation rules. It uses statistical models for Chinese and Japanese. > Is it conceivable to plug in one’s POS tagger (for > instance Stanford NLP Tools tagger) ? It is but we don't recommend it. These taggers assume grammaticality, and they don't show the actual wrong POS tags but the ones that should be there. So I really prefer writing rules manually, as they can be easily changed. > - Is it easly extensible ? (rule templates for new form of error > recognition, complex syntactic patterns that would require their own > implementation) I think so. > - Can it cope with structure information (xml tags). Here is an example > : enumerations. One could say that all items of an enumeration should > begin with the same form (infinitive verb, or noun, whatever). To verify > this, the structure of the document mus be taken in to account. If the > document is available in XML with sutructure information, it is > conceivable for LT to process such a document (does its architecture > allows this, if it not possible yet). Not possible yet as we don't have this layer of information. But in principle, it should be easy to add. Our problem was that it's hard to have a self-documenting example that checks if it works (we have examples for regression testing and for documentation; adding any styling or enumeration in pure text is difficult). But this is not rocket science: probably we can have additional style annotations for examples. > > Another topic : > > Do you know BlackLab (based on Lucene) > https://github.com/INL/BlackLab/wiki/Features ? > It can look for patterns (like LT rules) in very large amount of texts > (thanks to Lucene) and get almost immediate answers. > It can process annotated text (part of speech, up to 10 levels of more > type of linguistic information, semantic, tonalities, etc). > I have been playing with it and I think it could be of a good help to do > statistics on syntatctic patterns from large corpus, in order may be, to > infer correction rules froma corpus of uncorrect sentences. We use Lucene for regression checks on wikipedia and large corpora. Best regards, Marcin > > > Sorry I have not yet read the full LT documentation but I thought I > could save some time submitting a question on the dev mailing list, > > Thank you, > > Cheers, > Elie Naulleau > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel