Hi,
thank you all for the useful comments!

On Thu, Aug 9, 2012, at 15:35, Trosterud Trond wrote:
> 
> Kevin Brubeck Unhammer kirjoitti 9. aug. 2012 kello 14:54:
> 
> > Francis Tyers <[email protected]> writes:
--snip--
> 
> >>> 3. You might use a level 1 translation (without constraint grammar),
> >>> like the pair Swedish - Danish. In that case, you could make the
> >>> translation usable for a wide audience by adding the pair to Apertium
> >>> Caffeine and the new OmegaT plug-in.
> >> 
> >> In any case there is no free constraint grammar of Swedish currently 
> >> available.
> 
> The lack of CG for Swedish is a problem. My suggestion would be to write
> one. To be a bit specific:
> To write the 100-or-so rules needed for removing the gross majority, say
> 80(?)% of the ambiguity.
> 

Tihomir has told before that he plans to start developing a constraint
grammar for Swedish.

> > What you're describing is gisting/translation for understanding; I can't
> > imagine gisting MT would be very useful for sv-nb/nn (and I suspect
> > people would use Google for that anyway).
> 
> >From the Norwegian side, we cannot imagine the need for a sv-nb/nn gisting 
> >system. The maximum help we would need is, in rare cases, a dictionary 
> >translating  a small number of hard words.
> 
> How hard Norwegian is for Swedes is of course up to the Swedes to judge.
> But the competition will be between understanding the Norwegian text and
> understanding (sic) the MT output.

You are right of course, I should have thought of that.

> 
> > But with these closely related
> > languages, it's possible to get to a standard good enough for
> > post-editing (pre-publishing), e.g. with OmegaT as you mentioned, and in
> > that case the users definitely know which language it is already.
> 
> Yes, a production system (say, I want to translate a sv article to nn on
> Wikipedia) is a different matter. My experience from  nn-nb translation
> is that time saving from post editing as compared to
> rewriting/translation lies around 80%.

Yes, that was the scenario I first had in mind. But it would break if
there is a need for a constraint grammar, wouldn't it? And then there
wont be any use left for the Apertium-translation.

> 
> So yes, that can be a good idea. __But__ nb-nn lexicon and orthographic
> principles are the same, so more often than not unknown words will come
> out as free rides. For sv-nn/nb that will __not__ be the same (to the
> same extent), since both vocabulary and orthography deviates more. So,
> less free rides for unknown words. This implies that the transfer lexicon
> must be __much__ bigger than the nb-nn one in order to get the same good
> results as we have for nb-nn. The good news is that the making of such an
> enlarged transfer lexicon in part can be done automatically, and then
> manually post edited.

What do you have in mind? Please tell me more about how to generate the
bidic automatically!

And for the manual part:
Keld once told me there is a lists of "false friends" for da/sv/nb.
Where do I find that list of problematic words?

> 
> >> 
> >> (3) You make the two translators in the one pair. For this, you could
> >> have the same Swedish dictionary, but would need different nb and nn
> >> dictionaries, different sv-nb and sv-nn dictionaries and different sv-nb
> >> and sv-nn transfer rules.
> 
> > (3) sounds best to me too.
> I agree.
> 
> > Perhaps you could even do with one bidix, and
> > just use the alt="nn" vs alt="nb" attribute; a rough and dirty count
> > shows that the majority of entries in the nn-nb bidix carry over the
> > same lemma/tag:
> 
> This could very well be the case, yes (cf. my experiences with free
> rides).
> 
> > That said, I would pick one first and get the system up and running,
> > then expand to both later on.
> 
> This is also a possibility, yes. But the expansion to both languages
> should be taken into account in the setup phase.
> 

How to proceed? Say that I go for (3) with one bidix and start with
bokmål (nb).

BTW On my hand cream I can read "N/D Intensivt mykgjørende/blødgørende
og pleiende håndkrem/håndcreme." Looks like a similar approach for Norsk
bokmål (nb) and Danish (da)! That's why I thought of reusing the danish
- swedish transfer rules.

--snip--
> 
> 
> > http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursar has
> > more frequency lists (they also taunt you with this enormous corpus, but
> > it's currently "in beta", very messy, and best avoided for now).
> 
> The best resource is the NoWaC corpus, it also has frequency lists, both
> for lemmata and for word forms.
> 
> My final comment would be that the work will be 
> 
> 1 in the analysis/generation of Swedish
> 2 … and in the bidix.
> 
> As for 1, we should look around in the Swedish language technology
> landscape and look for open resources, e.g. in Gothenburg (Aarne Ranta,
> also Språkbanken).

What kind of resources do I need?

> 
> As for 2, Lexin might be one resource. I am on Euralex in Oslo right now,
> and will ask around.

Fine! Besides, what's Lexin?

> 
> Trond.



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to