Great! A couple of caveats with working with the lt-expand output:
* It will include multiword expressions -- not all of which you will want. You can grep them out with grep -v '#' * It will include clitic forms and contractions/abbreviations -- which you might also not want, you can grep them out with grep -v '+' Let us know how you get on! Regards, Fran El dv 28 de 09 de 2012 a les 17:24 -0400, en/na Steve Rawlinson va escriure: > Hi Fran, > > > Wow, thanks for the helpful instructions and quick reply! It looks > like it will do exactly what I need. I'll give it a try. > > > As for what I am intending to use the output for, I'm looking into > creating a translation dictionary for a certain platform. It would be > sold commercially, but I would follow the GPL license rules of > course. If it's profitable, then I'd be happy to give back to the > Apertium community by sponsoring work or contributing in some other > way. > > > Thanks! > Steve > > > > > On Fri, Sep 28, 2012 at 5:10 PM, Francis Tyers <[email protected]> > wrote: > El dv 28 de 09 de 2012 a les 21:07 +0000, en/na Francis Tyers > va > escriure: > > El dv 28 de 09 de 2012 a les 16:53 -0400, en/na Steve > Rawlinson va > > escriure: > > > Hello, > > > > > > > > > I'm looking to hire/sponsor a developer that is familiar > with Apertium > > > dix/metadix formats. I hope it's ok to post this request > here. I'd > > > be happy to have the resulting work put under the GPL (or > similar open > > > source license) and contributed back to Apertium. Perhaps > as part of > > > dixtools or as a new feature? > > > > > > > > > Here is what I need, a program that can read a bilingual > dictionary > > > (dix or metadix?) and output all the word pair > translations, including > > > all the conjugations, inflected forms, plurals, etc. (any > possible > > > variations on the words) that are available in the > dictionary for the > > > left side. I've looked around in the dixtools and > lttoolbox and I > > > don't see anything that does this, but maybe I've missed > it? > > > > > > > > > This command is pretty close to what I need: > > > > > > "apertium-dixtools list pairs" > > > > > > > > > But it doesn't seem to do the inflected forms. If I > understand things > > > correctly, this should be possible by making use of the > paradigms in > > > the mono dictionaries. > > > > > > > > > Here's a quick example of what I need for the Spanish verb > "tener": > > > > > > > > > tengo /to have > > > tienes/to have > > > tiene/to have > > > tenemos/to have > > > teneis/to have > > > tienen/to have > > > > > > > > > Plus all the other conjugations that are in the the mono > dictionary > > > paradigm (future, imperfect, etc.) If someone knows how > to do this > > > with the current tools please let me know! > > > > > > > > > If nothing currently exists, then adding to dixtools in > Java might > > > make the most sense. I personally prefer to work in > Python, but I'd > > > be open to any language. > > > > > > > > > If you're interested in working on this feature please > post a reply or > > > email me directly with your interest and a proposal (just > a quick > > > outline of how you'd do this, costs, etc.) > > > > Hi there! > > > > Here is a sequence of commands that will more or less get > you what you > > want, without having to use python or java or anything. > > > > Step 1: Expand the source language morphological dictionary. > > > > $ lt-expand apertium-en-es.es.dix > /tmp/es.exp > > > > $ head /tmp/es.exp > > abyectas:abyecto<adj><f><pl> > > abyecta:abyecto<adj><f><sg> > > abyectos:abyecto<adj><m><pl> > > > > Step 2: Pass the lexical form side through the bilingual > dictionary. > > > > cat /tmp/es.exp | sed 's/:>:/:/g' | sed 's/:<:/:/g' | cut > -f2 -d':' | > > sed 's/^/^/g' | sed 's/$/$/g' | lt-proc -b es-en.autobil.bin > > > /tmp/es-en.exp > > > > ^abyecto<adj><f><pl>/abject<adj><f><pl>$ > > ^abyecto<adj><f><sg>/abject<adj><f><sg>$ > > ^abyecto<adj><m><pl>/abject<adj><m><pl>$ > > ^abyecto<adj><m><sg>/abject<adj><m><sg>$ > > > > Step 3: Paste the output together. > > > > $ paste /tmp/es.exp /tmp/es-en.exp | sed 's/:>:/:/g' | sed > 's/:<:/:/g' | > > sed 's/:/\t/g'| sed 's/\//\t/1' | cut -f1,4 | head | cut -f1 > -d'<' | sed > > 's/\t/\/ /g' | head > > > > abyectas/ abject > > abyecta/ abject > > abyectos/ abject > > abyecto/ abject > > abyectĂsimas/ abject > > abyectĂsima/ abject > > > > If you want something more involved, you could try getting > into contact > > with Prompsit Language Engineering, who offer services based > around > > Apertium. Their email: [email protected] > > > > Regards, > > > PS. What were you intending to use the output for -- if you > don't mind > me asking :) > > F. > > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > ------------------------------------------------------------------------------ Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
