Jonas Fromseier Mortensen <[email protected]> writes: > Hi Fran > > I've submitted my proposal on the wiki. > http://wiki.apertium.org/wiki/User:Jonasfromseier/GSoC_2013_Application: > _%22Danish-Norwegian_(Bokm%C3%A5l)_language_pair%22 > > 1) Instead of just making it nb-da, make it no-da with support for > analysis/generation of both Bokmål and Nynorsk. Unhammer might > have more > ideas on this. > > Would it go no>nb>da or should the platform do both simultaneously? As > I'm brand new to this I'd rather be realistic and do one language at a > time. > > > 2) Take the Oslo-Bergen constraint grammar for Bokmål[1] and > "convert/port" it to Danish. I'm sure many of the rules could be > reused, > but they would need to be adapted to Danish words/tags. > > > That sounds like a great idea! I'll incorporate that. > > 3) For generating the bilingual dictionary try using cognates. > > > Not sure how this is done yet. Is there a script? > > my own that you mentioned on IRC: > 4) bidirectionality: > Do students normally finish a bidirectional pair GSoc? I'd be worried > about doing grammaticality judgements for generated nb text. I think > you need a native speaker for that. It'd be hard for me to judge > whether the form is obscure, especially since Norwegian and Danish > were so close a hundred years ago and some forms are still used but > considered archaic. I don't want the generated Norwegian text to be a > hybrid.
Yeah, I'd say do no→da first. > Basically my proposal is to do a rock-solid nb>da pair for starters, > including porting the CG, extending the monodices and bidix and then > see if I have time for nynorsk support and bidirectionality. How does > that sound? Supporting nn→da as well should present only minor additions, mostly to your make system. You'd have two modes generated, nn→da and nb→da, which would share everything that comes _after_ bidix (ie. structural transfer and da generation). On the nn/nb side of the pipeline, you could probably snatch the monodixes, prob files and CG's from apertium-nn-nb without changes. The only remaining thing is bidix. Here I would keep one master bidix from no (nn+nb) to da, which is processed by an XSLT script into two different bidixes before compilation. Most entries would be the same, but some would be marked nn-only or nb-only. This kind of thing happens in a lot of apertium pairs, and should be no trouble to set up. -- Kevin Brubeck Unhammer Written with baby on lap, please excuse my brevity. ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
