Hi Miquel,
Thanks for your thorough answer.

I've tried ./autogen.sh
I had to install httrack, but then got:
checking for a Python interpreter with version >= 2.7... none
configure: error: You don't have Python 2.7 or later installed.

Is it really necessary to update Python?

It appears that the configure script demands Python >= 2.7 In Debian
Squeeze Pyhton 2.6.6 is the default.
I'm afraid of messing things up if I install Python manually, and not
with Synaptic. Lots of things depend on Python.

And upgrading to Debian Wheezy might fuzz things up as well ...

Yours,
Per Tunedal


On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote:

Hi Per,

2014-02-18 21:37 GMT+01:00 Per Tunedal <[1][email protected]>:

Hi Miquel,
thank you. Looks like a good approach.

Looking at the script:
It runs GIZA++ in both directions to begin with? I just have to supply
the bitext files?


Yes, you only need to provide the bitext files compressed with gzip.


But the script have some trouble finding the GIZA++ files:
per@Pers-debian:~/script$ sh [2]bitextor-builddics.in sv fr
"/home/per/corpora/[3]OpenOffice3.fr-sv.sv"
"/home/per/corpora/[4]OpenOffice3.fr-sv.fr"
"/home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz
adict.sv-fr"
TOKENISING THE CORPUS...
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/[5]OpenOffice3.fr-sv.sv: not in gzip format
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/[6]OpenOffice3.fr-sv.fr: not in gzip format
LOWERCASING THE CORPUS...
FILTERING OUT TOO LONG SENTENCES...
FORMATTING THE CORPUS FOR PROCESSING...
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb": Filen eller
katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb": Filen eller
katalogen finns inte
BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT...
CHECKING COOCURRENCE OF WORDS IN THE CORPUS...
BUILDING PROBABILISTIC DICTIONARIES...
FILTERING DICTIONARY...
egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep:
/tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte
: Filen eller katalogen finns inte
[7]bitextor-builddics.in: 173: __PYTHON__: not found
DONE!


I'm sorry, I didn't explain it well: as I said,
[8]bitextor-builddics.in is only the template of the script. What I
didn't say is that you need to compile the project to get the true
script. If you have a look into the code of the template, you will see
that there are many variables starting and ending with "__" (such as
__PREFFIX__). These variables are replaced  by the corresponding paths
at compilation time. So, to use the script, you have to download the
whole trunk directory, and then to run:
./autogen.sh
./configure
make
make install

As you know, you can use the option --prefix=LOCALDIR when running
./configure to install bitextor in a specific path (for example
LOCALDIR could be /home/per/local/).

Best,

Miquel.


Yours,
Per Tunedal

On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote:

Hi Per,

I think that the explanation in this website:
[9]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful.
It helps a lot to understand the structure and the content of each file
generated by OmegaT.

About the script, in the last release of bitextor we included a script
called "bitextor-builddics" (you can find the template of this script
here:
[10]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
) which uses GIZA++ to obtain a plain text bilingual dictionary, but
only including pairs of words fulfilling: a) both words occur at least
10 times in the corpus, and b) the harmonic mean of their probabilities
in both probabilistic dictionaries (S -> T and T -> S) is higher than
0.2. If you want to use this, I recommend you to use the version in the
trunk, which fixes some minor bugs still present in the release.

Best,

Miquel.

2014-02-17 14:21 GMT+01:00 Per Tunedal <[11][email protected]>:

Hi Miquel,
thank you for your informative answer. In deed I needed to create a
coocurrence file.
I did successfully create such a file with snt2cooc.out

And GIZA++ has run successfully and made a lot of files in my home
directory (!).

How do I redirect the output to a more suitable folder? -outputpath ?

Where can I find an explanation of the content of the files?

I suppose the dictionary is in the translation table *.t3.final
Any convenient way to extract plain text dictionaries (without going
one step further and use Moses)?
Some script available to decode the translation table by the using the
vocabulary files *.vcb ?

Yours,
Per Tunedal



On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote:

Hi Per,

if I am not wrong, depending on how you compile GIZA++, it can generate
the coocurrence files on-the-fly during alignment, or you may need to
do so before running the alignment. Actually, I think that, with the
standard compilation, you are in the second case. Have a look
here: [12]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope
the link will be helpful!

Cheers,

Miquel.


2014-02-17 10:30 GMT+01:00 Per Tunedal <[13][email protected]>:

  Hi,
  I tried the procedure described at
  [14]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough
  dictionary, but encountered the following error in the last step:
  ERROR: NO COOCURRENCE FILE GIVEN!
  Is one step missing in the procedure?
  Yours,
  Per Tunedal
  --------------------------------------------------------------------
  ----------
  Android apps run on BlackBerry 10
  Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
  Now with support for Jelly Bean, Bluetooth, Mapview and more.
  Get your Android app in front of a whole new audience.  Start now.
  [15]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140
  /ostg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [16][email protected]
  [17]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

-----------------------------------------------------------------------
-------

Android apps run on BlackBerry 10

Introducing the new BlackBerry 10.2.1 Runtime for Android apps.

Now with support for Jelly Bean, Bluetooth, Mapview and more.

Get your Android app in front of a whole new audience.  Start now.

[18]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk

_______________________________________________

Apertium-stuff mailing list

[19][email protected]

[20]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

  --------------------------------------------------------------------
  ----------
  Android apps run on BlackBerry 10
  Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
  Now with support for Jelly Bean, Bluetooth, Mapview and more.
  Get your Android app in front of a whole new audience.  Start now.
  [21]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140
  /ostg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [22][email protected]
  [23]https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-----------------------------------------------------------------------
-------

Managing the Performance of Cloud-Based Applications

Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.

Read the Whitepaper.

[24]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk

_______________________________________________
Apertium-stuff mailing list
[25][email protected]
[26]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

  --------------------------------------------------------------------
  ----------
  Managing the Performance of Cloud-Based Applications
  Take advantage of what the Cloud has to offer - Avoid Common
  Pitfalls.
  Read the Whitepaper.
  [27]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140
  /ostg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [28][email protected]
  [29]https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-----------------------------------------------------------------------
-------

Managing the Performance of Cloud-Based Applications

Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.

Read the Whitepaper.

[30]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk

_______________________________________________

Apertium-stuff mailing list

[31][email protected]

[32]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

References

1. mailto:[email protected]
2. http://bitextor-builddics.in/
3. http://OpenOffice3.fr-sv.sv/
4. http://OpenOffice3.fr-sv.fr/
5. http://OpenOffice3.fr-sv.sv/
6. http://OpenOffice3.fr-sv.fr/
7. http://bitextor-builddics.in/
8. http://bitextor-builddics.in/
9. http://rali.iro.umontreal.ca/rali/?q=en/node/1325
  10. https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
  11. mailto:[email protected]
  12. https://code.google.com/p/giza-pp/issues/detail?id=9
  13. mailto:[email protected]
  14. http://wiki.apertium.org/wiki/Using_GIZA%2B%2B
  15. 
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
  16. mailto:[email protected]
  17. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  18. 
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
  19. mailto:[email protected]
  20. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  21. 
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
  22. mailto:[email protected]
  23. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  24. 
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
  25. mailto:[email protected]
  26. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  27. 
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
  28. mailto:[email protected]
  29. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  30. 
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
  31. mailto:[email protected]
  32. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to