W dniu 2014-07-08 08:15, R.J. Baars pisze:
> I think the best way is not tu use Hunspell directly, but to use Hunspell
> to help you create a PT_PT postag dictionary.
>
> That will help, not just in the current rules, but in lots of others as
> well. Postags are quite easy to use in the grammar xml.
>
> Marcin, Daniel, Am I correct in this?

Well, if you're asking about hunspell, then converting hunspell is the 
most tedious way. Marco can simply use a ready dictionary from FreeLing. 
Looking into hunspell is like trying to build a car from beer cans. In 
principle, you can do this, but it's a lengthy process and a costly one. 
But there's a ready file ready for grabs.

The Portuguese dictionary is already built. We simply haven't included 
it yet because we usually start from a certain number of rules, and then 
add the tagger. Using the tags in rules is a very good idea overall.

Marcin

>
> Ruud
>
>> Yes, the pt_PT .DIC has lots of
>>
>> +CAT=adj,G=m,N=s
>>
>>
>> but, I don't know how to code that into grammar.xml *yet* :-P
>>
>> I hope that Catalan does, so that I can see how it works?
>>
>> Thanks!
>>
>> Kind regards,
>>        >Marco A.G.Pinto
>>          -----------------------
>>
>> On 08/07/2014 07:00, R.J. Baars wrote:
>>> Looks you have been using the .dic-file mostly.
>>>
>>> When I look at the line in the aff that sais:
>>>
>>> SFX r   ogia            ógico           logia           +CAT=adj,G=m,N=s
>>>
>>> I see that the suffix coded r in this case seems to be and adjective,
>>> male, single?
>>> (I don know any Portuguese, but I do know Hunspell..)
>>>
>>>
>>> I am quite sure that this file can be used to 'tag' a huge list of
>>> portuguese words to tag them.
>>>
>>> This way, a portuguese tagging dictionary could be generated.
>>>
>>> One would need a big portuguese words list (which I have), this affix
>>> file, and more knowledge of Portuguese.
>>>
>>> Sounds feasible ...
>>>
>>> Ruud
>>>
>>>
>>>
>>>> Ruud!
>>>>
>>>> Thanks for your help.
>>>>
>>>> I have seen .AFF files since I am adding words to the en_GB for Mozilla
>>>> and OpenOffice.
>>>>
>>>> But, basically all I do in en_GB is to add words and codes in front of
>>>> them to generate more words, for example:
>>>> *store/S* will generate:
>>>> 1) store
>>>> 2) stores
>>>> and I have a user guide for each letter code.
>>>>
>>>> I noticed that the pt_PT Hunspell .DIC has lots of capital letters in
>>>> front of the words+codes, after a kind of TAB character which is used
>>>> to
>>>> separate them.
>>>>
>>>> This is all I know :-P
>>>>
>>>> I guess I must try Catalan... I wanted to do it tomorrow but I have the
>>>> dentist appointment in the morning and in the afternoon I will be at
>>>> the
>>>> university.
>>>>
>>>> I will try to have a look at it the moment I have some free time.
>>>>
>>>> Thank you all once again!
>>>>
>>>> Kind regards,
>>>>           >Marco A.G.Pinto
>>>>             -----------------------
>>>>
>>>>
>>>> On 07/07/2014 21:14, R.J. Baars wrote:
>>>>> There is some basis morphological data in the affix file of Hunspell.
>>>>> The
>>>>> Hunspell flags seem to be made on a word type basis.
>>>>>
>>>>> If that has been done correctly, postags could be derived form the
>>>>> flags
>>>>> ...
>>>>> It might be rough, but may also be just enough.
>>>>>
>>>>> If you never read an affix file, feel free to ask. Have a look at
>>>>> suffixes, these are probably the most useful.
>>>>>
>>>>>
>>>>> Ruud
>>
>> --
>> ------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft_______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to