El dj 09 de 08 de 2012 a les 10:35 +0200, en/na Per Tunedal va escriure:
> Hi,
> I consider Apertium suitable for translating the pair Swedish -
> Norwegian for the following reasons:
> 
> 1. They are closely related.
>  
> 2. You don't have an abundance of free bilingual resources, as Norway
> doesn't belong to EU. Thus, a statistical approach would be difficult.
> 
> 3. You might use a level 1 translation (without constraint grammar),
> like the pair Swedish - Danish. In that case, you could make the
> translation usable for a wide audience by adding the pair to Apertium
> Caffeine and the new OmegaT plug-in.

In any case there is no free constraint grammar of Swedish currently available.

> Is anyone working with the pair for the moment? I might start some work
> to begin familiarizing me with Apertium.

No-one is currently working on the pair.

> Some considerations:
> 
> A. Written Norwegian is in fact two different languages; Bokmål (nb) and
> Nynorsk (nn). If I simplify a lot, the former is basically Danish
> written by Norwegians (some words are completely different from Danish)
> and the later is a codification of the spoken traditional Norwegian
> (different words and a more complicated grammar). Both languages are
> official in Norway, but some variant is preferred in certain areas and
> by certain individuals. However, Bokmål is the dominating variant (80-90
> %).
> 
> How to handle this, when translating from Norwegian to Swedish? If a
> user encounters some text in Norwegian, he doesn't know if it's Bokmål
> or Nynorsk. He just surfed to some page with some interesting facts
> about bird watching, cod fishing, hiking in the mountains or what ever
> he is interested in. He just wants to translate the content.

There are three possibilities. 

(1) You can make an sv-nb (or sv-nn) translator, and then include a
subset of the nn-nb translator in it, piping the output of sv-nb into
sv-nn. (here you would have an sv-nb dictionary and an nb-nn dictionary)

(2) You make two translators in parallel.

(3) You make the two translators in the one pair. For this, you could
have the same Swedish dictionary, but would need different nb and nn
dictionaries, different sv-nb and sv-nn dictionaries and different sv-nb
and sv-nn transfer rules.

I think that (3) is probably best, but would like input from others
(e.g. Unhammer or Trond).

> Perhaps Apertium could do some test-translation to see if the text is
> written in Bokmål or Nynorsk? An then use the most fruitful translation
> pair for the translation to Swedish. Or just ignore Nynorsk? Wouldn't
> that be a shame?

Ignoring Nynorsk would be a great shame! Especially since it is the
favoured variant of Norwegian speakers working on Apertium ;)

> B. I have looked in the repository and found that some work has been
> done on the following dictionaries:
> 
> Danish (da) - Norwegian Bokmål (nb) - nursery
> Swedish (sv) - Norwegian Bokmål (nb) - incubator
> 
> Tihomir told me he's working on Swedish-Icelandic and has expanded the
> Swedish monolingual dictionary from sv-da. But which is the most
> complete Norwegian Bokmål (nb) monolingual dictionnary? The one from the
> pair Norwegian Bokmål (nb) - Norwegian Nynorsk (nn)?

Yes, I would take the Swedish dictionary from sv-is and the Norwegian
dictionar(y,ies) from nn-nb.

> C. Is it possible to reuse some transfer rules?

The transfer rules are the least of your worries. sv-da has a grand
total of 6, and nn-nb 13. 

> If Danish and Norwegian Bokmål are very similar, perhaps it's possible
> to reuse the transfer rules da-sv from the pair Danish (da) - Swedish
> (sv) for the translation from Swedish to Norwegian Bokmål (nb)? And the
> same in the other direction (i.e. convert the transfer rules for sv-da
> to rules for sv-nb)?

Reusing transfer rules probably isn't necessary. If you don't feel like
writing them, then you can write testcases on the Wiki and ask someone
on the list to write them. 

> Perhaps the maintainer of Danish (da) - Norwegian Bokmål (nb) can give
> me a hint? He's probably very updated on the differences between the two
> languages.

There is no maintainer that I know of. 

> D. Linguistic resources for Norwegian.
> 
> I have found frequency word lists for Norwegian Bokmål (nb) at
> http://helmer.aksis.uib.no/nta/ and can thus prioritize my work to the
> most important words.

Great.

> Online dictionnaries and grammatical resources can be found at the site
> of the Norwegian språkråd http://www.sprakrad.no/ .
> 
> What about corpus? I have found some bilingual data at Uppsala
> University http://opus.lingfil.uu.se/ (very low quality!). Any one who
> has found any other bilingual resources nb-sv? Any monolingual data?

Monolingual data, you have the SALDO for Swedish, it basically has the
whole of Swedish morphology covered and is GPL (or compatible).

For Norwegian, everything that you need is in nn-nb.

I have the Bible hand-aligned in Swedish and Bokmål, I can send it to
you if you want. You can then use instructions here:

http://wiki.apertium.org/wiki/Extracting_bilingual_dictionaries_with_Giza%2B%2B

(when the Wiki is back up) for making a probabilistic bilingual
dictionary.  

> Someone who knows about any good tool for extracting texts from the
> internet? I have tried a lot of them, the most promising are
> Corpuscatcher (the Yahoo API is obsolete and searching from a list of
> URLs doesn't work as expected), Webharvest (I haven't figured out the
> syntax yet) and Webextractor360 (need to update my knowledge of regexp).
> Corpuscatcher would do all the steps, if it worked: find promising
> websites, download the pages and convert the pages to text.

Bitextor. But really I wouldn't bother using a parallel corpus. You'll
spend more time fiddling with it than if you'd just translated the words
by hand.

> By the way, what about copy right issues? What can I do with downloaded
> web pages? As far as I know it wouldn't be any problem to:

The ones you list are non-problematic.

...snip...

> 
> What's the practice?
> 
> E. Any advice for me if I start working on the pair Swedish (sv) -
> Norwegian Bokmål (nb)? Have I missed something I need to know? Any other
> resources I can use?

My advice would be to start small, to avoid getting overwhelmed. 

Start from scratch on a small task. For example translating this short
story: 

http://www.unilang.org/ulrview.php?res=422,416

Once you have managed to make the system to translate this without any
system errors (the @, * # you see, not necessarily translation errors),
then you should have a good understanding of the system, and be well
founded to start working with the other resources.

It shouldn't take longer than a week, and some have done it in a couple
of days.

Best of luck! 

Fran


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to