Hi,
Thanks for your advice. My plan is to prune the phrase table before
binairzing. To me, pruning away unlikely phrase pairs, seems to be a
sensible approach. Binarizing will then further decrease the memory
requirements.
What's the difference to using compact phrase table?
Unfortunately, my efforts to build SALM have failed. I tried building
according to the instructions in the SALM readme file:
make allO64
and got the following error:
g++ -c -O -m64 -I../../Src/Shared -I../../Src/SuffixArrayApplications
-I../../Src/SuffixArrayApplications/SuffixArraySearch
-I../../Src/SuffixArrayApplications/SuffixArrayScan -I../../Src/Utils
-I../../Src/SuffixArrayApplications/SuffixArrayLanguageModel -o
Objs/Index/IndexSA.o64 ../../Src/IndexSA/IndexSA.cpp
../../Src/IndexSA/IndexSA.cpp: In function ‘int main(int, char**)’:
../../Src/IndexSA/IndexSA.cpp:46: error: ‘strcmp’ was not declared in
this scope
make: *** [Objs/Index/IndexSA.o64] Fel 1
Legend: 'Fel 1' = 'Error 1' (I get some error messages in Swedish.)
Yours,
Per Tunedal
On Wed, Apr 3, 2013, at 17:36, Philipp Koehn wrote:
> Hi,
>
> you should use the on-disk phrase table ("binarized phrase table")
> or compact phrase table and kenlm.
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc3
>
> -phi
>
> On Wed, Apr 3, 2013 at 2:36 PM, Per Tunedal <[email protected]>
> wrote:
> > Hi,
> > thanks for your advice. Cleaning the working directory did the trick.
> > Unfortunately, now the model is too large: I cannot translate in
> > reasonable time as the model doesn't fit in memory. The swap-file is
> > really slow.
> > Now it's time for pruning.
> > Yours,
> > Per Tunedal
> >
> > On Wed, Apr 3, 2013, at 9:40, Barry Haddow wrote:
> >> Hi Per
> >>
> >> You get these warnings:
> >>
> >> "has alignment point (15, 19) out of bounds (15, WARNING: sentence
> >> 2448049)"
> >>
> >> when your alignments don't match your corpus. Most likely you have
> >> accidentally reused alignments from another run. The fast training is
> >> also a sign that something went wrong.
> >>
> >> Try again with a clean working directory.
> >>
> >> cheers - Barry
> >>
> >>
> >> On 03/04/13 07:43, Per Tunedal wrote:
> >> > Hi,
> >> > Inspired by the paper "Does more data always yield better translations?"
> >> > @ aclweb.org/anthology-new/E/E12/E12-1016.pdf, that Ken Fasano kindly
> >> > linked to, I've experimented a great deal.
> >> >
> >> > I've tested several ways to pick a good sample of sentences from the
> >> > Europarl corpus, picking 10 % of the sentences. I just thought I've
> >> > found a promising method and decided to pick a larger sample, 35 %, and
> >> > expected a very much improved translation. On the contrary, the
> >> > translation of my test-text was terrible. It was turned into garbage.
> >> > Completely useless.
> >> >
> >> > I trained the phrase model with:
> >> > nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
> >> > train -corpus ~/corpora/Total1.sv-fr.clean_urval -f sv -e fr -alignment
> >> > grow-diag-final-and -reordering msd-bidirectional-fe -lm
> >> > 0:3:$HOME/lm/Total1.blm.fr:8 -external-bin-dir ~/mosesdecoder/tools
> >> > -parallel -cores 4 -score-options --GoodTuring >& training.out &
> >> >
> >> > The training was incredibly fast, in spite of the larger training
> >> > corpus.
> >> > After the line stating that moses.ini was created I found lots of
> >> > warnings of the type:
> >> > "has alignment point (15, 19) out of bounds (15, WARNING: sentence
> >> > 2448049)"
> >> >
> >> > Further the model (= the model folder) is very small: 277 MB,
> >> > phrase-table.gz: 83 MB.
> >> > The previous training with the same sample method (only 10% of the
> >> > Eurorparl) yielded: 495 MB phrase-table.gz: 173 MB
> >> >
> >> > Why this strange result? I suppose it has something to do with how the
> >> > phrases actually are extracted by Moses. The simple explanation "phrases
> >> > that are consistent with the word alignment" doesn't tell me enough.
> >> > Besides, I don't fully understand what it means. Maybe a very simple
> >> > example would make me understand the process.
> >> >
> >> > Yours,
> >> > Per Tunedal
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > [email protected]
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in
> >> Scotland, with registration number SC005336.
> >>
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support