Hi Miquel,
Thanks for your thorough answer.
I've tried ./autogen.sh
I had to install httrack, but then got:
checking for a Python interpreter with version >= 2.7... none
configure: error: You don't have Python 2.7 or later installed.
Is it really necessary to update Python?
It appears that the configure script demands Python >= 2.7 In Debian
Squeeze Pyhton 2.6.6 is the default.
I'm afraid of messing things up if I install Python manually, and not
with Synaptic. Lots of things depend on Python.
And upgrading to Debian Wheezy might fuzz things up as well ...
Yours,
Per Tunedal
On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote:
Hi Per,
2014-02-18 21:37 GMT+01:00 Per Tunedal <[1][email protected]>:
Hi Miquel,
thank you. Looks like a good approach.
Looking at the script:
It runs GIZA++ in both directions to begin with? I just have to supply
the bitext files?
Yes, you only need to provide the bitext files compressed with gzip.
But the script have some trouble finding the GIZA++ files:
per@Pers-debian:~/script$ sh [2]bitextor-builddics.in sv fr
"/home/per/corpora/[3]OpenOffice3.fr-sv.sv"
"/home/per/corpora/[4]OpenOffice3.fr-sv.fr"
"/home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz
adict.sv-fr"
TOKENISING THE CORPUS...
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/[5]OpenOffice3.fr-sv.sv: not in gzip format
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/[6]OpenOffice3.fr-sv.fr: not in gzip format
LOWERCASING THE CORPUS...
FILTERING OUT TOO LONG SENTENCES...
FORMATTING THE CORPUS FOR PROCESSING...
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb": Filen eller
katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb": Filen eller
katalogen finns inte
BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT...
CHECKING COOCURRENCE OF WORDS IN THE CORPUS...
BUILDING PROBABILISTIC DICTIONARIES...
FILTERING DICTIONARY...
egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep:
/tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte
: Filen eller katalogen finns inte
[7]bitextor-builddics.in: 173: __PYTHON__: not found
DONE!
I'm sorry, I didn't explain it well: as I said,
[8]bitextor-builddics.in is only the template of the script. What I
didn't say is that you need to compile the project to get the true
script. If you have a look into the code of the template, you will see
that there are many variables starting and ending with "__" (such as
__PREFFIX__). These variables are replaced by the corresponding paths
at compilation time. So, to use the script, you have to download the
whole trunk directory, and then to run:
./autogen.sh
./configure
make
make install
As you know, you can use the option --prefix=LOCALDIR when running
./configure to install bitextor in a specific path (for example
LOCALDIR could be /home/per/local/).
Best,
Miquel.
Yours,
Per Tunedal
On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote:
Hi Per,
I think that the explanation in this website:
[9]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful.
It helps a lot to understand the structure and the content of each file
generated by OmegaT.
About the script, in the last release of bitextor we included a script
called "bitextor-builddics" (you can find the template of this script
here:
[10]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
) which uses GIZA++ to obtain a plain text bilingual dictionary, but
only including pairs of words fulfilling: a) both words occur at least
10 times in the corpus, and b) the harmonic mean of their probabilities
in both probabilistic dictionaries (S -> T and T -> S) is higher than
0.2. If you want to use this, I recommend you to use the version in the
trunk, which fixes some minor bugs still present in the release.
Best,
Miquel.
2014-02-17 14:21 GMT+01:00 Per Tunedal <[11][email protected]>:
Hi Miquel,
thank you for your informative answer. In deed I needed to create a
coocurrence file.
I did successfully create such a file with snt2cooc.out
And GIZA++ has run successfully and made a lot of files in my home
directory (!).
How do I redirect the output to a more suitable folder? -outputpath ?
Where can I find an explanation of the content of the files?
I suppose the dictionary is in the translation table *.t3.final
Any convenient way to extract plain text dictionaries (without going
one step further and use Moses)?
Some script available to decode the translation table by the using the
vocabulary files *.vcb ?
Yours,
Per Tunedal
On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote:
Hi Per,
if I am not wrong, depending on how you compile GIZA++, it can generate
the coocurrence files on-the-fly during alignment, or you may need to
do so before running the alignment. Actually, I think that, with the
standard compilation, you are in the second case. Have a look
here: [12]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope
the link will be helpful!
Cheers,
Miquel.
2014-02-17 10:30 GMT+01:00 Per Tunedal <[13][email protected]>:
Hi,
I tried the procedure described at
[14]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough
dictionary, but encountered the following error in the last step:
ERROR: NO COOCURRENCE FILE GIVEN!
Is one step missing in the procedure?
Yours,
Per Tunedal
--------------------------------------------------------------------
----------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
[15]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140
/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[16][email protected]
[17]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
[18]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk
_______________________________________________
Apertium-stuff mailing list
[19][email protected]
[20]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--------------------------------------------------------------------
----------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
[21]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140
/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[22][email protected]
[23]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
[24]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
_______________________________________________
Apertium-stuff mailing list
[25][email protected]
[26]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
--------------------------------------------------------------------
----------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
Read the Whitepaper.
[27]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140
/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[28][email protected]
[29]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
[30]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
_______________________________________________
Apertium-stuff mailing list
[31][email protected]
[32]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
References
1. mailto:[email protected]
2. http://bitextor-builddics.in/
3. http://OpenOffice3.fr-sv.sv/
4. http://OpenOffice3.fr-sv.fr/
5. http://OpenOffice3.fr-sv.sv/
6. http://OpenOffice3.fr-sv.fr/
7. http://bitextor-builddics.in/
8. http://bitextor-builddics.in/
9. http://rali.iro.umontreal.ca/rali/?q=en/node/1325
10. https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
11. mailto:[email protected]
12. https://code.google.com/p/giza-pp/issues/detail?id=9
13. mailto:[email protected]
14. http://wiki.apertium.org/wiki/Using_GIZA%2B%2B
15.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
16. mailto:[email protected]
17. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
18.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
19. mailto:[email protected]
20. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
21.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
22. mailto:[email protected]
23. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
24.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
25. mailto:[email protected]
26. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
27.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
28. mailto:[email protected]
29. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
30.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
31. mailto:[email protected]
32. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff