Actually, what we need is a real parser. For English, there are some 
parsers that are known to be quite good:

http://aclweb.org/aclwiki/index.php?title=Parsers_for_English

What is needed is simply to interface it and add some new attributes 
that would be parser-independent. Maxim Mozgovoy already did some 
experiments and there were Japanese students who wanted to integrate a 
Japanese parser.

This is a perfect GSoC task, as speed tests would be needed. Probably 
Berkeley parser or Stanford parser would be just fine.

That said, for Polish there's no parser available so I'd need to have 
some grouping abilities in the disambiguator as well to port Spejd 
grouping rules (see the link above if you're interested - unfortunately, 
they are Polish-centered).

Best,
Marcin

W dniu 2013-03-01 20:42, Paolo Bianchini pisze:
> We are facing the same issue in italian: without understanding the context it 
> is hard to disambiguate by means of general rules. You need to get to the 
> level of specific words.
>
> I came to the conclusion that this problem should be addressed at the tagger 
> level by providing context based tagging (at least in the first instance). 
> The tagger should use a large corpus of correct sentences and the relative 
> tags in order to incorporate a knowledge base.
>
> Moreover, the tool itself should be able to feed into the corpus additional 
> correct sentences and learn when needed.
>
> I understand that a tagger based on simple word lookup is at the base of the 
> way it works right now, but i don't think that such an implementation 
> wouldn't be compatible.
>
> Ciao.
>
> Paolo
>
> On 01/mar/2013, at 14:44, "Mike Unwalla" <[email protected]> wrote:
>
>> Daniel wrote: Has anybody an idea how practical it would be to find these
>> [noun] phrases with disambiguation rules?
>>
>> Probably, you can do it, but a simple rule is unlikely to be sufficient. I
>> had a related problem when I wanted to disambiguate nouns and verbs.
>>
>> The groups of examples that follow show some problems that I had with the
>> identification of noun phrases. The target nouns phrases are in CAPITAL
>> LETTERS:
>>
>> SOME THIN OIL FILTERS are not satisfactory.
>> SOME THIN OIL filters through the sand.
>>
>> THE TEMPERATURE INCREASES and decreases are small.
>> THE TEMPERATURE increases and the gas expands.
>>
>> USED PLASTIC COVERS are not satisfactory.
>> The technician used PLASTIC COVERS, not metal covers.
>>
>> The next 3 examples show a semantic problem. Without giving LT information
>> about real-world meaning, LT cannot correctly disambiguate the text.
>>
>> The technician made THE OIL FILTER from a piece of old rag.
>> The technician made THE OIL filter into a clean container.
>> The technician made THE OIL FILTER into a toy rocket for his 7-year-old son.
>>
>> To see my rules, look at the rulegroup id="POS_DISAMBIGUATION_IDENTIFY_NOUN"
>> in
>> www.simplified-english.co.uk/disambiguation-en-asdste100-issue3-2013-02-01.z
>> ip. (The rules use new POS, not the default POS in LT.)
>>
>> Regards,
>>
>> Mike Unwalla
>> Contact: www.techscribe.co.uk/techw/contact.htm
>>
>>
>> -----Original Message-----
>> From: Daniel Naber [mailto:[email protected]]
>> Sent: 01 March 2013 10:28
>> To: development discussion for LanguageTool
>> Subject: finding English phrases
>>
>> Hi,
>>
>> one of the significant sources of false alarms in English is the fact that
>> LT doesn't properly handle phrases. For example:
>>
>> "There are several cargo and passenger ferries."
>>
>> leads to an error because only "several cargo" is considered and LT
>> requires "several" to be followed by a plural noun. Instead, "cargo and
>> passenger ferries" should be considered one plural noun phrase.
>>
>> Has anybody an idea how practical it would be to find these phrases with
>> disambiguation rules? One could do this (just an example, it doesn't fully
>> cover the example above):
>>
>>     <rule id="NNPS_PHRASE1" name="plural noun phrase">
>>         <pattern>
>>             <marker>
>>                 <token postag="NN"></token>
>>             </marker>
>>             <token postag="NNS"></token>
>>         </pattern>
>>         <disambig action="add"><wd pos="NNPS_PHRASE_START"/></disambig>
>>     </rule>
>>
>> Then the rules that now look for plural nouns would have to be changed to
>> look for NNPS_PHRASE_START.
>>
>> Is there a way to get "longest match" with disambiguation rules? It seems
>> to me it's at least difficult to remove shorter phrases inside longer
>> phrases.
>>
>> Any ideas or actual rules for this are very welcome. I think this is one of
>> the remaining major problems for English (and actually not only English).
>>
>> Regards
>> Daniel
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_feb
>> _______________________________________________
>> Languagetool-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to