Jonas Fromseier Mortensen
<[email protected]> writes:

> Hi Fran 
>
> I've submitted my proposal on the wiki.
> http://wiki.apertium.org/wiki/User:Jonasfromseier/GSoC_2013_Application:
> _%22Danish-Norwegian_(Bokm%C3%A5l)_language_pair%22
>
>     1) Instead of just making it nb-da, make it no-da with support for
>     analysis/generation of both Bokmål and Nynorsk. Unhammer might
>     have more
>     ideas on this. 
>
> Would it go no>nb>da or should the platform do both simultaneously? As
> I'm brand new to this I'd rather be realistic and do one language at a
> time. 
>
>     
>     2) Take the Oslo-Bergen constraint grammar for Bokmål[1] and
>     "convert/port" it to Danish. I'm sure many of the rules could be
>     reused,
>     but they would need to be adapted to Danish words/tags.
>     
>
> That sounds like a great idea! I'll incorporate that.
>
>     3) For generating the bilingual dictionary try using cognates. 
>     
>
> Not sure how this is done yet. Is there a script?
>
> my own that you mentioned on IRC:
> 4) bidirectionality:
> Do students normally finish a bidirectional pair GSoc? I'd be worried
> about doing grammaticality judgements for generated nb text. I think
> you need a native speaker for that. It'd be hard for me to judge
> whether the form is obscure, especially since Norwegian and Danish
> were so close a hundred years ago and some forms are still used but
> considered archaic. I don't want the generated Norwegian text to be a
> hybrid. 

Yeah, I'd say do no→da first.

> Basically my proposal is to do a rock-solid nb>da pair for starters,
> including porting the CG, extending the monodices and bidix and then
> see if I have time for nynorsk support and bidirectionality. How does
> that sound?

Supporting nn→da as well should present only minor additions, mostly to
your make system. You'd have two modes generated, nn→da and nb→da, which
would share everything that comes _after_ bidix (ie. structural transfer
and da generation). On the nn/nb side of the pipeline, you could
probably snatch the monodixes, prob files and CG's from apertium-nn-nb
without changes. 

The only remaining thing is bidix. Here I would keep one master bidix
from no (nn+nb) to da, which is processed by an XSLT script into two
different bidixes before compilation. Most entries would be the same,
but some would be marked nn-only or nb-only. This kind of thing happens
in a lot of apertium pairs, and should be no trouble to set up.



-- 
Kevin Brubeck Unhammer

Written with baby on lap, please excuse my brevity.


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to