---------- Forwarded message ----------
From: Juan Miguel Cejuela <[email protected]>
Date: 2011/4/7
Subject: Fwd: [Apertium-stuff] GSoC & Master Thesis project together
To: [email protected]
Cc: [email protected]


Hej Jacob,

I guess if you read this email you've already read by now my mail in the
main apertium-list about the possibility to combine both my master thesis
and the GSoC project.

I've discussed with Jim via irc the idea you had about a year ago using a
second level of transducers instead of flag diacritics for state pruning,
that you designed on paper plates ;) The conversation with him is below.

And yes, I'd love to use Java :P

I'm working now hard to have a good application for the GSoC ready. I've
just known today about the Apertium project and because the deadline is
tomorrow I've been forced today to crazily work out and organize so many
things for the project, with my home university (Technical University of
Munich, TUM) and my possible advisor for the thesis from there, Hasan Ibne
Akram <http://www.sec.in.tum.de/hasan-ibne-akram/>.

The thing is the following. I like a lot Jim's proposed project and would
*love* to work with transducers in machine translation for my master thesis
(and more framed in the GSoC). Only, I would like, and specially my possible
advisor, to have some more training/learning involved. I believe in your
group you despise SMT so I'll be delicate with my words :) Of course
understandable since your platform is rule-based.

In any case, such a GSoC project should be expanded to comply with the
higher master thesis's requirements, specially with its more scientific
nature and formal correctness in contract with a more only code-based GSoC
project. It's because of this that my advisor and I have thought of 3
different project possibilities:


   1. Go exactly with Jim's description about cascaded transducers to
   implement state pruning plus some expansion to fill a master thesis's work.
   In this case, my possible advisor would be actually not that interested, and
   because of that, if you wanted, you could become both my mentor for the GSoC
   project and my advisor for my thesis. I believe you're an associate
   professor in computer science at a Danish university. Don't know which. By
   the way, I studied for 1 year in Denmark in Aarhus Universitet and there I
   did my bachelor thesis in the bioinformatics department where I wrote a
   complete library for HMMs in Common Lisp.
   2. Expand Jim's description to include some learning. Specifically, my
   advisor is mostly interested in topology learning, i.e., the (semi)
   automatic construction of the transducer's architecture given the problem
   and corpora data to operate on. He writing his PhD dissertation on this and
   uses/studies for example the OSTIA state-merging algorithm (for more info,
   check Learning Finite State
Transducers<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9855&rep=rep1&type=pdf>
   )
   3. Go with another different thing. We still have to work out the exact
   project, tomorrow, but the project would involve the use of your
corpora Southeast
   European Times <http://xixona.dlsi.ua.es/%7Efran/setimes/>, for which
   we're very interested in, and again use transducer learning and machine
   learning.


So that's it. The point is, what are you mostly interested in? Do you like
our proposals?


I will today-tomorrow work hard to have a more formal specification of our
proposed projects and send my application/s via the GSoC page. But as said,
I would love to know beforehand your opinion.


Thank you so much. Tusind tak :)



---------- Forwarded message ----------
From: Jimmy O'Regan <[email protected]>
Date: 2011/4/7
Subject: Re: [Apertium-stuff] GSoC & Master Thesis project together
To: Juan Miguel Cejuela <[email protected]>, Jacob Nordfalk <
[email protected]>


On 7 April 2011 15:25, Juan Miguel Cejuela <[email protected]> wrote:
> So before further elaborating on an exact project, my question for you,
> mainly for Jimregan, Francis Tyers, and Jacob Nordfalk, is: would you be
> interested in such a project, mentoring a GSoC project that would be
further
> expanded to comply with the higher requirements of a master thesis?

Hi Jacob.

Juan Miguel is pretty advanced in terms of FSTs, so I explained to him
the idea from the mentor summit last year. He says he's more fluent in
Java, so maybe you'd like to have a chat with him about it?

[16:04]  <jimregan> Jacob and I designed an alternative to flag
diacritics on the back of some paper plates last year
[16:04]  <jimregan> using a second transducer
[16:04]  <jimregan> (the original motivation behind flag diacritics -
which are an ugly hack - was to avoid using a second transducer)
[16:05]  <jimregan> we use a fairly large number of transducers, in
almost every phase of the translation pipeline
[16:05]  <jimregan> so we thought it wouldn't be unreasonable to have
a second transducer to use in state pruning, rather than just
balancing special symbols
[16:07]  <jimregan> the second transducer would basically take the
same form as translation rules in apertium-transfer, but with the
ability to fully or partially lexicalise (possibly based on a subset
of regexes)
[16:07]  <jimregan> if the initial analysis involves a continuation,
pass it to the second for validation
[16:07]  <jmcejuela> aha
[16:07]  <jimregan> prune it if it doesn't match
[16:08]  <jimregan> take the symbols of the second transduction for
output if it does
[16:08]  <jimregan> Jacob was pretty pationate about implementing it
himself, but I think he'd be interested in mentoring it
[16:10]  <jimregan> I just remembered how to summarise it: two level
morphology as cascading transducers
[16:11]  <jimregan> but, as it's cascading, two level could
potentially be made n-level


--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you



-- 
Juan Miguel Cejuela
   jmcejuela.com
   [email protected]
   twitting @jmcejuela <http://twitter.com/jmcejuela>
   +49 176 627 581 05

1s+ Yo*



-- 
Juan Miguel Cejuela
   jmcejuela.com
   [email protected]
   twitting @jmcejuela <http://twitter.com/jmcejuela>
   +49 176 627 581 05

1s+ Yo*
------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to