Hi Miquel, yes, that what was I had in my mind. But it doesn't help much dough.
Next dependency is some Python library for levenstien distance ... There must be an easier way to test the script and see if it gives me something useful. I'm not interested in testing the other functions right now. Just compile the script somehow? Or just hard code paths into the script? Yours, Per Tunedal On Thu, Feb 20, 2014, at 10:46, Miquel Esplà wrote: > Hi Per, > > I didn't try to compile with the version of Python you are using, but you > can try to change this condition in configure.ac to do so. > > Cheers, > > Miquel. > > > 2014-02-20 10:19 GMT+01:00 Per Tunedal <[email protected]>: > > > Hi Miquel, > > Thanks for your thorough answer. > > > > I've tried ./autogen.sh > > I had to install httrack, but then got: > > checking for a Python interpreter with version >= 2.7... none > > configure: error: You don't have Python 2.7 or later installed. > > > > Is it really necessary to update Python? > > > > It appears that the configure script demands Python >= 2.7 In Debian > > Squeeze Pyhton 2.6.6 is the default. > > I'm afraid of messing things up if I install Python manually, and not with > > Synaptic. Lots of things depend on Python. > > > > And upgrading to Debian Wheezy might fuzz things up as well ... > > > > Yours, > > Per Tunedal > > > > > > On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote: > > > > Hi Per, > > > > 2014-02-18 21:37 GMT+01:00 Per Tunedal <[email protected]>: > > > > Hi Miquel, > > thank you. Looks like a good approach. > > > > Looking at the script: > > It runs GIZA++ in both directions to begin with? I just have to supply the > > bitext files? > > > > > > Yes, you only need to provide the bitext files compressed with gzip. > > > > > > > > But the script have some trouble finding the GIZA++ files: > > per@Pers-debian:~/script$ sh bitextor-builddics.in sv fr > > "/home/per/corpora/OpenOffice3.fr-sv.sv" "/home/per/corpora/ > > OpenOffice3.fr-sv.fr" > > "/home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.gizadict.sv-fr" > > TOKENISING THE CORPUS... > > Can't open perl script "__PREFIX__/share/bitextor/utils/tokenizer.perl": > > Filen eller katalogen finns inte > > gzip: /home/per/corpora/OpenOffice3.fr-sv.sv: not in gzip format > > Can't open perl script "__PREFIX__/share/bitextor/utils/tokenizer.perl": > > Filen eller katalogen finns inte > > gzip: /home/per/corpora/OpenOffice3.fr-sv.fr: not in gzip format > > LOWERCASING THE CORPUS... > > FILTERING OUT TOO LONG SENTENCES... > > FORMATTING THE CORPUS FOR PROCESSING... > > mv: kan inte ta status på > > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt": Filen > > eller katalogen finns inte > > mv: kan inte ta status på > > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt": Filen > > eller katalogen finns inte > > mv: kan inte ta status på > > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb": Filen eller katalogen > > finns inte > > mv: kan inte ta status på > > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb": Filen eller katalogen > > finns inte > > BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT... > > CHECKING COOCURRENCE OF WORDS IN THE CORPUS... > > BUILDING PROBABILISTIC DICTIONARIES... > > FILTERING DICTIONARY... > > egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep: > > /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte > > : Filen eller katalogen finns inte > > bitextor-builddics.in: 173: __PYTHON__: not found > > DONE! > > > > > > I'm sorry, I didn't explain it well: as I said, bitextor-builddics.in is > > only the template of the script. What I didn't say is that you need to > > compile the project to get the true script. If you have a look into the > > code of the template, you will see that there are many variables starting > > and ending with "__" (such as __PREFFIX__). These variables are > > replaced by the corresponding paths at compilation time. So, to use the > > script, you have to download the whole trunk directory, and then to run: > > ./autogen.sh > > ./configure > > make > > make install > > > > As you know, you can use the option --prefix=LOCALDIR when running > > ./configure to install bitextor in a specific path (for example LOCALDIR > > could > > be /home/per/local/). > > > > Best, > > > > Miquel. > > > > > > > > Yours, > > Per Tunedal > > > > On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote: > > > > Hi Per, > > > > I think that the explanation in this website: > > http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It > > helps a lot to understand the structure and the content of each file > > generated by OmegaT. > > > > About the script, in the last release of bitextor we included a script > > called "bitextor-builddics" (you can find the template of this script here: > > https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in) > > which uses GIZA++ to obtain a plain text bilingual dictionary, but only > > including pairs of words fulfilling: a) both words occur at least 10 times > > in the corpus, and b) the harmonic mean of their probabilities in both > > probabilistic dictionaries (S -> T and T -> S) is higher than 0.2. If you > > want to use this, I recommend you to use the version in the trunk, which > > fixes some minor bugs still present in the release. > > > > Best, > > > > Miquel. > > > > 2014-02-17 14:21 GMT+01:00 Per Tunedal <[email protected]>: > > > > Hi Miquel, > > thank you for your informative answer. In deed I needed to create a > > coocurrence file. > > I did successfully create such a file with snt2cooc.out > > > > And GIZA++ has run successfully and made a lot of files in my home > > directory (!). > > > > How do I redirect the output to a more suitable folder? -outputpath ? > > > > Where can I find an explanation of the content of the files? > > > > I suppose the dictionary is in the translation table *.t3.final > > Any convenient way to extract plain text dictionaries (without going one > > step further and use Moses)? > > Some script available to decode the translation table by the using the > > vocabulary files *.vcb ? > > > > Yours, > > Per Tunedal > > > > > > > > On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote: > > > > Hi Per, > > > > if I am not wrong, depending on how you compile GIZA++, it can generate > > the coocurrence files on-the-fly during alignment, or you may need to do so > > before running the alignment. Actually, I think that, with the standard > > compilation, you are in the second case. Have a look here: > > https://code.google.com/p/giza-pp/issues/detail?id=9 I hope the link will > > be helpful! > > > > Cheers, > > > > Miquel. > > > > 2014-02-17 10:30 GMT+01:00 Per Tunedal <[email protected]>: > > > > > > Hi, > > I tried the procedure described at > > http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough > > dictionary, but encountered the following error in the last step: > > > > ERROR: NO COOCURRENCE FILE GIVEN! > > > > Is one step missing in the procedure? > > > > Yours, > > Per Tunedal > > > > > > > > ------------------------------------------------------------------------------ > > Android apps run on BlackBerry 10 > > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > > Now with support for Jelly Bean, Bluetooth, Mapview and more. > > Get your Android app in front of a whole new audience. Start now. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > > _______________________________________________ > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > ------------------------------------------------------------------------------ > > Android apps run on BlackBerry 10 > > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > > Now with support for Jelly Bean, Bluetooth, Mapview and more. > > Get your Android app in front of a whole new audience. Start now. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > > *_______________________________________________* > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > ------------------------------------------------------------------------------ > > Android apps run on BlackBerry 10 > > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > > Now with support for Jelly Bean, Bluetooth, Mapview and more. > > Get your Android app in front of a whole new audience. Start now. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > > _______________________________________________ > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > ------------------------------------------------------------------------------ > > Managing the Performance of Cloud-Based Applications > > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > > Read the Whitepaper. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > > *_______________________________________________* > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > ------------------------------------------------------------------------------ > > Managing the Performance of Cloud-Based Applications > > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > > Read the Whitepaper. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > > _______________________________________________ > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > ------------------------------------------------------------------------------ > > Managing the Performance of Cloud-Based Applications > > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > > Read the Whitepaper. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > > *_______________________________________________* > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > > > > ------------------------------------------------------------------------------ > > Managing the Performance of Cloud-Based Applications > > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > > Read the Whitepaper. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > > _______________________________________________ > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
