The university of Nijmegen (The Netherlands) has been working on tool 
combinations that do exactly that, use machine learning from large 
corpora, Dutch as well as English.

The Dutch front-end is called valkuil.net. More info on the tools and 
usage is here:

http://webservices-lst.science.ru.nl/

It is a very server-based solution.

Ruud


On 01-03-13 20:42, Paolo Bianchini wrote:
> We are facing the same issue in italian: without understanding the context it 
> is hard to disambiguate by means of general rules. You need to get to the 
> level of specific words.
>
> I came to the conclusion that this problem should be addressed at the tagger 
> level by providing context based tagging (at least in the first instance). 
> The tagger should use a large corpus of correct sentences and the relative 
> tags in order to incorporate a knowledge base.
>
> Moreover, the tool itself should be able to feed into the corpus additional 
> correct sentences and learn when needed.
>
> I understand that a tagger based on simple word lookup is at the base of the 
> way it works right now, but i don't think that such an implementation 
> wouldn't be compatible.
>
> Ciao.
>
> Paolo
>
> On 01/mar/2013, at 14:44, "Mike Unwalla" <[email protected]> wrote:
>
>> Daniel wrote: Has anybody an idea how practical it would be to find these
>> [noun] phrases with disambiguation rules?
>>
>> Probably, you can do it, but a simple rule is unlikely to be sufficient. I
>> had a related problem when I wanted to disambiguate nouns and verbs.
>>
>> The groups of examples that follow show some problems that I had with the
>> identification of noun phrases. The target nouns phrases are in CAPITAL
>> LETTERS:
>>
>> SOME THIN OIL FILTERS are not satisfactory.
>> SOME THIN OIL filters through the sand.
>>
>> THE TEMPERATURE INCREASES and decreases are small.
>> THE TEMPERATURE increases and the gas expands.
>>
>> USED PLASTIC COVERS are not satisfactory.
>> The technician used PLASTIC COVERS, not metal covers.
>>
>> The next 3 examples show a semantic problem. Without giving LT information
>> about real-world meaning, LT cannot correctly disambiguate the text.
>>
>> The technician made THE OIL FILTER from a piece of old rag.
>> The technician made THE OIL filter into a clean container.
>> The technician made THE OIL FILTER into a toy rocket for his 7-year-old son.
>>
>> To see my rules, look at the rulegroup id="POS_DISAMBIGUATION_IDENTIFY_NOUN"
>> in
>> www.simplified-english.co.uk/disambiguation-en-asdste100-issue3-2013-02-01.z
>> ip. (The rules use new POS, not the default POS in LT.)
>>
>> Regards,
>>
>> Mike Unwalla
>> Contact: www.techscribe.co.uk/techw/contact.htm
>>
>>
>> -----Original Message-----
>> From: Daniel Naber [mailto:[email protected]]
>> Sent: 01 March 2013 10:28
>> To: development discussion for LanguageTool
>> Subject: finding English phrases
>>
>> Hi,
>>
>> one of the significant sources of false alarms in English is the fact that
>> LT doesn't properly handle phrases. For example:
>>
>> "There are several cargo and passenger ferries."
>>
>> leads to an error because only "several cargo" is considered and LT
>> requires "several" to be followed by a plural noun. Instead, "cargo and
>> passenger ferries" should be considered one plural noun phrase.
>>
>> Has anybody an idea how practical it would be to find these phrases with
>> disambiguation rules? One could do this (just an example, it doesn't fully
>> cover the example above):
>>
>>     <rule id="NNPS_PHRASE1" name="plural noun phrase">
>>         <pattern>
>>             <marker>
>>                 <token postag="NN"></token>
>>             </marker>
>>             <token postag="NNS"></token>
>>         </pattern>
>>         <disambig action="add"><wd pos="NNPS_PHRASE_START"/></disambig>
>>     </rule>
>>
>> Then the rules that now look for plural nouns would have to be changed to
>> look for NNPS_PHRASE_START.
>>
>> Is there a way to get "longest match" with disambiguation rules? It seems
>> to me it's at least difficult to remove shorter phrases inside longer
>> phrases.
>>
>> Any ideas or actual rules for this are very welcome. I think this is one of
>> the remaining major problems for English (and actually not only English).
>>
>> Regards
>> Daniel
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_feb
>> _______________________________________________
>> Languagetool-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to