Hi,

On Thu, Aug 9, 2012, at 23:23, Trosterud Trond wrote:
> 
> Per Tunedal kirjoitti 9. aug. 2012 kello 20:21:
> > Tihomir has told before that he plans to start developing a constraint
> > grammar for Swedish.
> 
> Good. Again: 
> - Are there open resources?
> - Could something be ported from Norwegian? (perhaps only indirectly).
> 
> >> Yes, a production system (say, I want to translate a sv article to nn on
> >> Wikipedia) (…)
> 
> > Yes, that was the scenario I first had in mind. But it would break if
> > there is a need for a constraint grammar, wouldn't it? And then there
> > wont be any use left for the Apertium-translation.
> 
> Well. Since a handful of rules will remove most ambiguities, what is left
> will be partly disambiguated. And how bad this is for MT needs to be
> seen. So it will not break. It will only be more problematic, and the
> result will be poorer.

Mikel Artetxe has explained that the OmegaT plug-in doesn't work for
language pairs that depends on programs that aren't a part of
lttoolbox-java. Six language pairs depend on the Constraint Grammar
package and are thus excluded, one of them is apertium-nn-nb. But sv-da
doesn't use any constraint grammar, thus I concluded that sv-nb (Norsk
bokmål) wouldn't need one either. And would come to real use, by real
translators, using OmegaT. If the pair cannot be used, I don't see any
need to develop it.

> 
> >> The good news is that the making of such an
> >> enlarged transfer lexicon in part can be done automatically, and then
> >> manually post edited.
> > What do you have in mind? Please tell me more about how to generate the
> > bidic automatically!
> 
> a. via a parallel corpus (of course)

I thought I could do without this and only work with monolingual data.
And, of course, existing dictionaries and rules in other language pairs
involving Swedish (sv) and Norwegian (nb/nn).
 
> b. by
> --- 1 taking a sv list of words
> --- run it through a sv2no orthographical + lexical transfer

Tools for this? Any documentation?

> --- analyze the output, and pick the recognized matches (input N Sg ->
> collect all N Sg output)
> --- go through the result manually
> 
> About the transducer:
> Lexical changes: samhälle > samfunn, prefikset o- -> u-, stad -> by (when
> these occur in compounds) 
> suffixes: -tion -> -sjon, 
> The obvious things: ö>ø, ä>æ, x>ks
> 
> See e.g.
> associationsrikedom, variationsrikedom, situationsrikedom,
> infektionssjukdom, kombinationsslalom, informationsergonom,
> nationalekonom, sundströmnationalekonom, konsumtionsboom,
> kommunikationsform, organisationsform, notationsform, injektionsform,
> portionsform, distributionsform, nationalsocialism, ationalism,
> nationalism, smygnationalism, multinationalism, hypernationalism,
> internationalism, vänsternationalism, naturnationalism,
> hägnainossnationalism, statsnationalism, rationalism, sensationalism,
> traditionalism, funktionalism, exceptionalism, koncentrationskapitalism,
> mutationsmekanism, isolationism, exhibitionism, perfektionism,
> protektionism, interventionism
> 
> This is a list over -tion- words. They shall all have -sjon- in nb, nn.
> In addition: c > s, rikedom > rikdom, sjuk > sky (nb only), ekonom >
> økonom, social > social, -ism > -isme, xc > ks, 
> 
> Thys a long row of small changes are needed for making such loanword
> strings into Norwegian. In a recent frequency corpus from Svenska
> språkbanken i found 365000 unique word forms, of these, 7700 contained
> -tion, and thus need the ruleset above.
> 
> > And for the manual part:
> > Keld once told me there is a lists of "false friends" for da/sv/nb.
> > Where do I find that list of problematic words?
> 
> In paper dictionaries and textbooks used in the universities for learning
> your neighboring language.
> 
> >> 
> >> 1 in the analysis/generation of Swedish
> >> 2 … and in the bidix.
> >> 
> >> As for 1, we should look around in the Swedish language technology
> >> landscape and look for open resources, e.g. in Gothenburg (Aarne Ranta,
> >> also Språkbanken).
> > 
> > What kind of resources do I need?
> 
> For 1: swetwol :-) But it seems there are resources in Gothenburg:
> 
> http://www.cse.chalmers.se/alumni/markus/FM/
> http://www.cse.chalmers.se/alumni/markus/FM/download/swedish.lexicon
> 
> This might even work.

As an input for transfer rules or for a potential constraint grammar?

> 
> >> As for 2, Lexin might be one resource. I am on Euralex in Oslo right now,
> >> and will ask around.
> > Fine! Besides, what's Lexin?
> 
> Lexicon för invandrare, http://lexin.nada.kth.se/lexin/

As a native Swede, I don't see any need for this.

> 
> Trond.
> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to