El dt 12 de 03 de 2013 a les 18:19 +0100, en/na Mikel L. Forcada va escriure: > Hi Apertiumers: > I have some more ideas to add to Fran's. And there are a few more I > cannot remember now, so I may come back. My ideas are usually > outlandish and challenging, and Fran usually says they are not proper > GSoC projects but hey, don't we want the best students?
Yes we do ! :) > (1) Sliding-window part-of-speech tagger. The idea is to implement the > unsupervised part-of-speech tagger > (http://en.wikipedia.org/wiki/Sliding_window_based_part-of-speech_tagging) as > a drop-in replacement for the current hidden-Markov-model tagger. Ideally, it > should have support for unknown words, and also for "forbid" descriptions > (not described in the paper). The tagger has a very intuitive interpretation > (believe me, even if you find the maths a bit daunting). I am available for > questions (I invented the tagger, I should be able to remember!). I think this would make a great project, we really need improved morphological disambiguation in canonical Apertium. It's particularly nice in that it can be represented as an FST (hopefully with lttoolbox). I'm not sure if I quite understand the results in the paper though. The performance of the tagger was better than bigram HMM trained with Baum-Welch, but even then had a ~35% error rate ? > (2) Improving the web-based dictionary maintenance tool developed by > Daniel Torregrosa-Rivero > (http://apertium.vm.bytemark.co.uk/simpledix/): create configuration > files for other language pairs and entry types, etc. The code is > available > at: http://apertium.svn.sf.net/viewvc/apertium/trunk/apertium-simpledix/ . > This is related, I think, to Fran's 2). This is, to my knowledge, the most > promising alternative to editing XML .dix files directly to add simple > entries, but I might be wrong. Yep, this would be a fine project. > (3) A preprocessor or compiler to avoid having to write structural > transfer (i.e., .t1x, .t2x and .t3x) rules in raw XML which is very > overt and clear, but clumsy and hard to write. Before Apertium, in > interNOSTRUM.com we had a language for .t1x-style files called > MorphTrans, which is described in the > paper > http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/3355/1843 > . I believe this language is much easier to write; it should be upgraded and > documented. The preprocessor would read .mt1, .mt2, and .mt3 files in > MorphTrans-style format (with keywords in English) and generate the current > XML. There would also be the opposite tool (much easier to write as an XSLT > stylesheet) to generate MorphTrans-style code from current XML code. > Morphtrans can of course be redesigned a bit, and, in fact, it should. I love the "si ... altrament" :D And yes, this would be a fine project I think. One of the challenges would be writing the validator though. > (3') The same for .dix files. Two roundtrip converters to use the old > interNOSTRUM-style format > (http://www.sepln.org/revistaSEPLN/revista/25/25-Pag93.pdf), which is > much easier to write. imho, this is basically reinventing lexc -- were there validators available for it ? > (4) One step beyond (3): a visual interface to writing structural > transfer rules. One would have to invent something, starting perhaps > with a visual rendering of block structure in the original XML > language: how about something like Scratch (http://scratch.mit.edu/), > where jigsaw-puzzle-style pieces only fit if the syntax is right...? This is a nice idea :) > (5) Extending the .dix language (and modifying lt-proc or writing a > pre-processor to it) to be able to deal with the kind of stuff that > some people miss in the .dix (and .metadix) formats and makes them use > HFST which means that people have to mix two different dictionary > formats in the same language pair. And yes, of course, having > something that translates the current HFST format to the new superdix > format. Yes, you guessed, I'd love to throw HFST off board. I can > tolerate it as a temporary heresy to keep the church of Apertium > together, but, as co-pope [1], I'd like to canonicalize Apertium in > the end. And it would be easier to deal with prefixes hey Jonathan? Yes, this is a great idea too! It's partly taken into account in: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox "flag diacritics" is a bit of an odd term which basically means "constraints which forbid/enforce certain non-adjacent morpheme combinations". and also, partially in: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Closer_integration_with_HFST Closer integration sounds a bit "ecumenical", but actually, the first point is about coming up with a way of representing things like archiphonemes in an lttoolbox-like fashion. Feel free to edit these pages, adding your own ideas. Or we could just add a new page. > (6) Tools to order .dixes and point at "bad coding style" (which would > have to be defined). My collection is that the current .dix format is > too powerful and allows almost anything. I have to think more about > this idea, but I couldn't help throwing it out at you. We have the idea "lint for Apertium " which is quite similar to this one. http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/lint_for_Apertium > I think that is enough for the moment, don't you folks think? > > > In connection with Fran's 3), one could perhaps take a look at > Retratos http://sourceforge.net/projects/retratos/ . I can talk to > Helena de Medeiros Caseli who is the admin of that project. Yes, my idea (3) was basically to come up with something similar to retratos, but more advanced in that it takes into account the paradigms, and necessity of having different kinds of entries in the bilingual dictionary. ... > [1] It does not matter who the real Apertium pope is. It's always > going to be "who's that guy in white next to Fran Tyers?". :D F. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
