Hi Miquel,
thank you. Looks like a good approach.

Looking at the script:
It runs GIZA++ in both directions to begin with? I just have to supply
the bitext files?

But the script have some trouble finding the GIZA++ files:
per@Pers-debian:~/script$ sh bitextor-builddics.in sv fr
"/home/per/corpora/OpenOffice3.fr-sv.sv"
"/home/per/corpora/OpenOffice3.fr-sv.fr"
"/home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz
adict.sv-fr"
TOKENISING THE CORPUS...
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/OpenOffice3.fr-sv.sv: not in gzip format
Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl": Filen eller katalogen
finns inte
gzip: /home/per/corpora/OpenOffice3.fr-sv.fr: not in gzip format
LOWERCASING THE CORPUS...
FILTERING OUT TOO LONG SENTENCES...
FORMATTING THE CORPUS FOR PROCESSING...
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt":
Filen eller katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb": Filen eller
katalogen finns inte
mv: kan inte ta status på
"/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb": Filen eller
katalogen finns inte
BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT...
CHECKING COOCURRENCE OF WORDS IN THE CORPUS...
BUILDING PROBABILISTIC DICTIONARIES...
FILTERING DICTIONARY...
egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep:
/tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte
: Filen eller katalogen finns inte
bitextor-builddics.in: 173: __PYTHON__: not found
DONE!

Yours,
Per Tunedal

On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote:

Hi Per,

I think that the explanation in this website:
[1]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful.
It helps a lot to understand the structure and the content of each file
generated by OmegaT.

About the script, in the last release of bitextor we included a script
called "bitextor-builddics" (you can find the template of this script
here:
[2]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in)
which uses GIZA++ to obtain a plain text bilingual dictionary, but only
including pairs of words fulfilling: a) both words occur at least 10
times in the corpus, and b) the harmonic mean of their probabilities in
both probabilistic dictionaries (S -> T and T -> S) is higher than 0.2.
If you want to use this, I recommend you to use the version in the
trunk, which fixes some minor bugs still present in the release.

Best,

Miquel.


2014-02-17 14:21 GMT+01:00 Per Tunedal <[3][email protected]>:

Hi Miquel,
thank you for your informative answer. In deed I needed to create a
coocurrence file.
I did successfully create such a file with snt2cooc.out

And GIZA++ has run successfully and made a lot of files in my home
directory (!).

How do I redirect the output to a more suitable folder? -outputpath ?

Where can I find an explanation of the content of the files?

I suppose the dictionary is in the translation table *.t3.final
Any convenient way to extract plain text dictionaries (without going
one step further and use Moses)?
Some script available to decode the translation table by the using the
vocabulary files *.vcb ?

Yours,
Per Tunedal



On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote:

Hi Per,

if I am not wrong, depending on how you compile GIZA++, it can generate
the coocurrence files on-the-fly during alignment, or you may need to
do so before running the alignment. Actually, I think that, with the
standard compilation, you are in the second case. Have a look
here: [4]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope
the link will be helpful!

Cheers,

Miquel.


2014-02-17 10:30 GMT+01:00 Per Tunedal <[5][email protected]>:

  Hi,
  I tried the procedure described at
  [6]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough
  dictionary, but encountered the following error in the last step:
  ERROR: NO COOCURRENCE FILE GIVEN!
  Is one step missing in the procedure?
  Yours,
  Per Tunedal
  --------------------------------------------------------------------
  ----------
  Android apps run on BlackBerry 10
  Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
  Now with support for Jelly Bean, Bluetooth, Mapview and more.
  Get your Android app in front of a whole new audience.  Start now.
  [7]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/
  ostg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [8][email protected]
  [9]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

-----------------------------------------------------------------------
-------

Android apps run on BlackBerry 10

Introducing the new BlackBerry 10.2.1 Runtime for Android apps.

Now with support for Jelly Bean, Bluetooth, Mapview and more.

Get your Android app in front of a whole new audience.  Start now.

[10]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk

_______________________________________________

Apertium-stuff mailing list

[11][email protected]

[12]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

  --------------------------------------------------------------------
  ----------
  Android apps run on BlackBerry 10
  Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
  Now with support for Jelly Bean, Bluetooth, Mapview and more.
  Get your Android app in front of a whole new audience.  Start now.
  [13]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140
  /ostg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [14][email protected]
  [15]https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-----------------------------------------------------------------------
-------

Managing the Performance of Cloud-Based Applications

Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.

Read the Whitepaper.

[16]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk

_______________________________________________

Apertium-stuff mailing list

[17][email protected]

[18]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

References

1. http://rali.iro.umontreal.ca/rali/?q=en/node/1325
2. https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
3. mailto:[email protected]
4. https://code.google.com/p/giza-pp/issues/detail?id=9
5. mailto:[email protected]
6. http://wiki.apertium.org/wiki/Using_GIZA%2B%2B
7. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
8. mailto:[email protected]
9. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  10. 
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
  11. mailto:[email protected]
  12. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  13. 
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
  14. mailto:[email protected]
  15. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
  16. 
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
  17. mailto:[email protected]
  18. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to