Re: [Apertium-stuff] Stop merging lines

mansur Thu, 08 Nov 2018 23:55:00 -0800

One more example:

- Фәнис Яруллин �
- Фәнис Яруллинга багышланган чараларның һәрберсендә катнашырга тырышам, -
диде әдипнең дусты Мохтар Афзалов.


^-/-<guio>$ ^Фәнис/Фәнис<np><ant><m><nom>$
^Яруллин/Яруллин<np><cog><m><nom>$ �-/-<guio>$
^Фәнис/Фәнис<np><ant><m><nom>$ ^Яруллинга/Яруллин<np><cog><m><dat>$
^багышланган/багышла<v><tv><pass><gpr_past>$ ^чараларның/чара<n><pl><gen>$
^һәрберсендә/*һәрберсендә$ ^катнашырга/катнаш<v><tv><inf>$
^тырышам/тырыш<v><tv><pres><p1><sg>$^,/,<cm>$ ^-/-<guio>$
^диде/ди<v><tv><ifi><p3><sg>$ ^әдипнең/әдип<n><sg><gen>$
^дусты/дуст<n><sg><px3sp><nom>$ ^Мохтар/Мохтар<np><ant><m><nom>$
^Афзалов/Афзалов<np><cog><m><nom>+и<cop><aor><p3><sg>$^./.<sent>$

Here it happens because of some broken char... But why?


Am Fr., 9. Nov. 2018 um 10:24 Uhr schrieb mansur <6688...@gmail.com>:

> Hello!
>
> > I don't think so, I think Mansur wants the tagger to disambiguate
> > according
> > to the context, but have it in line-by-line output, like TreeTagger or
> > UDpipe
>
> Fran, no, no, I don't think so, Kevin was right :) I think tagger should
> not disambiguate across lines. Because in corpus different lines sometimes
> are taken from different texts, so lines should be absolutely independent
> for a tagger.
>
> By the way, I found example of actual lines merging:
>
> һәм бу очракта "җиң сызганып" туры мәгънәдә 😉
> Кибеткә бара идем.
>
> ^һәм/һәм<cnjcoo>$ ^бу/бу<prn><dem><nom>$ ^очракта/очрак<n><sg><sg><loc>$
> ^"/"<sent>$^җиң сызганып/җиң сызган<v><tv><gna_perf>$^"/"<sent>$
> ^туры/туры<adj>$ ^мәгънәдә/мәгънә<n><sg><sg><loc>$
> �^Кибеткә/Кибет<n><sg><sg><dat>$ ^бара/бар<v><tv><pres><p3><sg>$
> ^идем/и<cop><ifi><p1><sg>$^./.<sent>$
>
> Best!
> Mansur
>
>
> Am Do., 8. Nov. 2018 um 23:05 Uhr schrieb saurabh dubey <
> sauvzi13...@gmail.com>:
>
>> Hello sir,
>> I am a student from JIIT Noida, India. Currently, I'm working on Deep
>> learning and Specifically on NLP( Natural language processing) and NMT(
>> Neural machine translation).
>> As your open source organization already contributing in this field from
>> a very long time, So you can be a great mentor for me and your
>> guidance will be really valuable for me.  I really want to work in this
>> field and want to learn more.
>> *I have little knowledge in the field. I have already worked on a
>> few small projects of my own as mentioned:*
>> *-Sentimental analysis*
>> *-Created a chatbot by using Deep NLP model in Tenserflow and python.    *
>> * - Few things learned during the process are:*
>> *   1. Type of Natural Language Processing*
>> *   2. Seq2Seq Architecture & Training*
>> *   3. End to End Deep learning models*
>> *   4. Beam search decoding.*
>>
>> I would love to learn and then contribute to Apertium.
>>
>> *There are some Ideas on which we can work:*
>>
>> * 1. A chatbot for your website for Q&A.*
>> * 2. India there is about 23 official languages and I would love to work
>> for any of them to extend your spectrum.*
>> * 3.  Additional toolbox with the given feature:*
>> *      -Copy*
>> *      -Share*
>> *      -Text-to-speech recognition.*
>>
>> Kindly assist me in this process as I really dedicated and focused
>> towards this field and would love to assure my commitment.
>> *I hope you acknowledge my efforts. *
>>
>> On Thu, Nov 8, 2018 at 7:39 PM Kevin Brubeck Unhammer <unham...@fsfe.org>
>> wrote:
>>
>>> Francis Tyers <fty...@prompsit.com> čálii:
>>>
>>> [...]
>>>
>>> >>> That would be a good feature, but wouldn't get past the issue of the
>>> >>> tagger/cg. E.g. if we do that then the tagger can't take into account
>>> >>> context.
>>> >>
>>> >> Isn't that the whole point? (Ie. treat each line as completely
>>> >> independent, no context.)
>>> >
>>> > I don't think so, I think Mansur wants the tagger to disambiguate
>>> > according
>>> > to the context, but have it in line-by-line output, like TreeTagger or
>>> > UDpipe
>>> > etc.
>>>
>>> Well, it's only lt-proc doing the moving, so just move the NUL-deletion
>>> before cg-proc:
>>>
>>>    cat corpus.txt                     \
>>>    | tr -d '\0'                       \
>>>    | apertium-deshtml -n              \
>>>    | sed 's/\[$/[][/; s/^]/]\x00/'    \
>>>    | lt-proc -z -w 'tat.automorf.bin' \
>>>    | tr -d '\0'                       \
>>>    | cg-proc -z  'tat.rlx.bin'        \
>>>    | cg-proc -z -w -1 dev/mansur.bin' \
>>>    | apertium-rehtml-noent
>>>
>>> Now only lt-proc should treat end-of-line as a stream delimiter.
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Stop merging lines

Reply via email to