Hi Miquel,
thank you. I will give it a try.
Yours,
Per Tunedal
On Thu, Feb 20, 2014, at 19:48, Miquel Esplà wrote:
Well, of course you can try to replace manually the variables by paths
(as I told you, you have to try to replace variables starting and
ending with __). I don't think I can help you much more because I never
did this, but I'm sure that with a bit of patiente you will do it ;)
Good luck!
Cheers,
Miquel.
2014-02-20 14:11 GMT+01:00 Per Tunedal <[1][email protected]>:
Hi Miquel,
yes, that what was I had in my mind. But it doesn't help much dough.
Next dependency is some Python library for levenstien distance ...
There must be an easier way to test the script and see if it gives
me
something useful. I'm not interested in testing the other functions
right now.
Just compile the script somehow? Or just hard code paths into the
script?
Yours,
Per Tunedal
On Thu, Feb 20, 2014, at 10:46, Miquel Esplà wrote:
> Hi Per,
>
> I didn't try to compile with the version of Python you are using, but
you
> can try to change this condition in [2]configure.ac to do so.
>
> Cheers,
>
> Miquel.
>
>
> 2014-02-20 10:19 GMT+01:00 Per Tunedal
<[3][email protected]>:
>
> > Hi Miquel,
> > Thanks for your thorough answer.
> >
> > I've tried ./autogen.sh
> > I had to install httrack, but then got:
> > checking for a Python interpreter with version >= 2.7... none
> > configure: error: You don't have Python 2.7 or later installed.
> >
> > Is it really necessary to update Python?
> >
> > It appears that the configure script demands Python >= 2.7 In
Debian
> > Squeeze Pyhton 2.6.6 is the default.
> > I'm afraid of messing things up if I install Python manually, and
not with
> > Synaptic. Lots of things depend on Python.
> >
> > And upgrading to Debian Wheezy might fuzz things up as well ...
> >
> > Yours,
> > Per Tunedal
> >
> >
> > On Wed, Feb 19, 2014, at 9:58, Miquel Esplà wrote:
> >
> > Hi Per,
> >
> > 2014-02-18 21:37 GMT+01:00 Per Tunedal
<[4][email protected]>:
> >
> > Hi Miquel,
> > thank you. Looks like a good approach.
> >
> > Looking at the script:
> > It runs GIZA++ in both directions to begin with? I just have to
supply the
> > bitext files?
> >
> >
> > Yes, you only need to provide the bitext files compressed with
gzip.
> >
> >
> >
> > But the script have some trouble finding the GIZA++ files:
> > per@Pers-debian:~/script$ sh [5]bitextor-builddics.in sv fr
> > "/home/per/corpora/[6]OpenOffice3.fr-sv.sv" "/home/per/corpora/
> > [7]OpenOffice3.fr-sv.fr"
> >
"/home/per/block_world_corpus/GIZA++_wordlists/bitextor/OpenOffice3.giz
adict.sv-fr"
> > TOKENISING THE CORPUS...
> > Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl":
> > Filen eller katalogen finns inte
> > gzip: /home/per/corpora/[8]OpenOffice3.fr-sv.sv: not in gzip format
> > Can't open perl script
"__PREFIX__/share/bitextor/utils/tokenizer.perl":
> > Filen eller katalogen finns inte
> > gzip: /home/per/corpora/[9]OpenOffice3.fr-sv.fr: not in gzip format
> > LOWERCASING THE CORPUS...
> > FILTERING OUT TOO LONG SENTENCES...
> > FORMATTING THE CORPUS FOR PROCESSING...
> > mv: kan inte ta status på
> > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv_corpus.clean.fr.snt":
Filen
> > eller katalogen finns inte
> > mv: kan inte ta status på
> > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr_corpus.clean.sv.snt":
Filen
> > eller katalogen finns inte
> > mv: kan inte ta status på
> > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.sv.vcb": Filen eller
katalogen
> > finns inte
> > mv: kan inte ta status på
> > "/tmp/tempcorpuspreproc.QP7LM/corpus.clean.fr.vcb": Filen eller
katalogen
> > finns inte
> > BUILDING WORD CLASSES FOR IMPROVING ALIGNMENT...
> > CHECKING COOCURRENCE OF WORDS IN THE CORPUS...
> > BUILDING PROBABILISTIC DICTIONARIES...
> > FILTERING DICTIONARY...
> > egrep: /tmp/tempgizamodel.RlVVs/fr.vcbegrep:
> > /tmp/tempgizamodel.RlVVs/sv.vcb: Filen eller katalogen finns inte
> > : Filen eller katalogen finns inte
> > [10]bitextor-builddics.in: 173: __PYTHON__: not found
> > DONE!
> >
> >
> > I'm sorry, I didn't explain it well: as I said,
[11]bitextor-builddics.in is
> > only the template of the script. What I didn't say is that you need
to
> > compile the project to get the true script. If you have a look into
the
> > code of the template, you will see that there are many variables
starting
> > and ending with "__" (such as __PREFFIX__). These variables are
> > replaced by the corresponding paths at compilation time. So, to
use the
> > script, you have to download the whole trunk directory, and then to
run:
> > ./autogen.sh
> > ./configure
> > make
> > make install
> >
> > As you know, you can use the option --prefix=LOCALDIR when running
> > ./configure to install bitextor in a specific path (for example
LOCALDIR could
> > be /home/per/local/).
> >
> > Best,
> >
> > Miquel.
> >
> >
> >
> > Yours,
> > Per Tunedal
> >
> > On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote:
> >
> > Hi Per,
> >
> > I think that the explanation in this website:
> > [12]http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite
useful. It
> > helps a lot to understand the structure and the content of each
file
> > generated by OmegaT.
> >
> > About the script, in the last release of bitextor we included a
script
> > called "bitextor-builddics" (you can find the template of this
script here:
> >
[13]https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
)
> > which uses GIZA++ to obtain a plain text bilingual dictionary, but
only
> > including pairs of words fulfilling: a) both words occur at least
10 times
> > in the corpus, and b) the harmonic mean of their probabilities in
both
> > probabilistic dictionaries (S -> T and T -> S) is higher than 0.2.
If you
> > want to use this, I recommend you to use the version in the trunk,
which
> > fixes some minor bugs still present in the release.
> >
> > Best,
> >
> > Miquel.
> >
> > 2014-02-17 14:21 GMT+01:00 Per Tunedal
<[14][email protected]>:
> >
> > Hi Miquel,
> > thank you for your informative answer. In deed I needed to create a
> > coocurrence file.
> > I did successfully create such a file with snt2cooc.out
> >
> > And GIZA++ has run successfully and made a lot of files in my home
> > directory (!).
> >
> > How do I redirect the output to a more suitable folder? -outputpath
?
> >
> > Where can I find an explanation of the content of the files?
> >
> > I suppose the dictionary is in the translation table *.t3.final
> > Any convenient way to extract plain text dictionaries (without
going one
> > step further and use Moses)?
> > Some script available to decode the translation table by the using
the
> > vocabulary files *.vcb ?
> >
> > Yours,
> > Per Tunedal
> >
> >
> >
> > On Mon, Feb 17, 2014, at 11:08, Miquel Esplà wrote:
> >
> > Hi Per,
> >
> > if I am not wrong, depending on how you compile GIZA++, it can
generate
> > the coocurrence files on-the-fly during alignment, or you may need
to do so
> > before running the alignment. Actually, I think that, with the
standard
> > compilation, you are in the second case. Have a look here:
> > [15]https://code.google.com/p/giza-pp/issues/detail?id=9 I hope the
link will
> > be helpful!
> >
> > Cheers,
> >
> > Miquel.
> >
> > 2014-02-17 10:30 GMT+01:00 Per Tunedal
<[16][email protected]>:
> >
> >
> > Hi,
> > I tried the procedure described at
> > [17]http://wiki.apertium.org/wiki/Using_GIZA%2B%2B to get a rough
> > dictionary, but encountered the following error in the last step:
> >
> > ERROR: NO COOCURRENCE FILE GIVEN!
> >
> > Is one step missing in the procedure?
> >
> > Yours,
> > Per Tunedal
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Android apps run on BlackBerry 10
> > Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> > Now with support for Jelly Bean, Bluetooth, Mapview and more.
> > Get your Android app in front of a whole new audience. Start now.
> >
> >
[18]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk
> > _______________________________________________
> > Apertium-stuff mailing list
> > [19][email protected]
> > [20]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Android apps run on BlackBerry 10
> > Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> > Now with support for Jelly Bean, Bluetooth, Mapview and more.
> > Get your Android app in front of a whole new audience. Start now.
> >
> >
[21]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk
> > *_______________________________________________*
> > Apertium-stuff mailing list
> > [22][email protected]
> > [23]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Android apps run on BlackBerry 10
> > Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> > Now with support for Jelly Bean, Bluetooth, Mapview and more.
> > Get your Android app in front of a whole new audience. Start now.
> >
> >
[24]http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/os
tg.clktrk
> > _______________________________________________
> > Apertium-stuff mailing list
> > [25][email protected]
> > [26]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
> > Read the Whitepaper.
> >
> >
[27]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
> > *_______________________________________________*
> > Apertium-stuff mailing list
> > [28][email protected]
> > [29]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
> > Read the Whitepaper.
> >
> >
[30]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
> > _______________________________________________
> > Apertium-stuff mailing list
> > [31][email protected]
> > [32]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
> > Read the Whitepaper.
> >
> >
[33]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
> > *_______________________________________________*
> > Apertium-stuff mailing list
> > [34][email protected]
> > [35]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
> >
> >
-----------------------------------------------------------------------
-------
> > Managing the Performance of Cloud-Based Applications
> > Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
> > Read the Whitepaper.
> >
> >
[36]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
> > _______________________________________________
> > Apertium-stuff mailing list
> > [37][email protected]
> > [38]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> >
>
-----------------------------------------------------------------------
-------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common
Pitfalls.
> Read the Whitepaper.
>
[39]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> [40][email protected]
> [41]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
[42]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
_______________________________________________
Apertium-stuff mailing list
[43][email protected]
[44]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
[45]http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/os
tg.clktrk
_______________________________________________
Apertium-stuff mailing list
[46][email protected]
[47]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
References
1. mailto:[email protected]
2. http://configure.ac/
3. mailto:[email protected]
4. mailto:[email protected]
5. http://bitextor-builddics.in/
6. http://OpenOffice3.fr-sv.sv/
7. http://OpenOffice3.fr-sv.fr/
8. http://OpenOffice3.fr-sv.sv/
9. http://OpenOffice3.fr-sv.fr/
10. http://bitextor-builddics.in/
11. http://bitextor-builddics.in/
12. http://rali.iro.umontreal.ca/rali/?q=en/node/1325
13. https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in
14. mailto:[email protected]
15. https://code.google.com/p/giza-pp/issues/detail?id=9
16. mailto:[email protected]
17. http://wiki.apertium.org/wiki/Using_GIZA%2B%2B
18.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
19. mailto:[email protected]
20. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
21.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
22. mailto:[email protected]
23. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
24.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
25. mailto:[email protected]
26. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
27.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
28. mailto:[email protected]
29. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
30.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
31. mailto:[email protected]
32. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
33.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
34. mailto:[email protected]
35. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
36.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
37. mailto:[email protected]
38. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
39.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
40. mailto:[email protected]
41. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
42.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
43. mailto:[email protected]
44. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
45.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
46. mailto:[email protected]
47. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff