Re: [Moses-support] Parallelising Giza++ for supercomputers

James Read Mon, 23 Feb 2009 10:19:35 -0800

Found it. Thanks. Just what I needed.

Quoting Miles Osborne <[email protected]>:


> http://www.inf.ed.ac.uk/teaching/courses/mt/
>
> 2009/2/23 James Read <[email protected]>:
>> Thanks for that.
>>
>> I'm having difficulty finding the lecture notes on PK's homepage on
>> UoE site. Do you have a direct link to them?
>>
>> Quoting Barry Haddow <[email protected]>:
>>
>>> Hi James
>>>
>>> There was an investigation into different methods of parallelising  
>>>  model 1 in
>>> this paper:
>>> Fully Distributed EM for Very Large Datasets
>>> by: Jason Wolfe, Aria Haghighi, Dan Klein
>>> http://www.cs.berkeley.edu/~aria42/pubs/icml08-distributedem
>>>
>>> As for pseudo-code for the IBM models, check out Philipp Koehn's
>>> lecture notes
>>> on machine translation on the UoE site,
>>>
>>> cheers
>>> Barry
>>>
>>>
>>> On Friday 20 February 2009 13:51, Chris Dyer wrote:
>>>> Another architecture to consider is storing/distributing the ttable
>>>> from a single central repository.  Most of the ttable is full of crap,
>>>> and for each sentence, you know exactly what parameters will be
>>>> required in advance of running your E step.  However, by not
>>>> distributing stuff that you don't need, you'll save a lot of effort,
>>>> and the amount of memory used by each individual compute node will be
>>>> radically reduced.
>>>>
>>>> Chris
>>>>
>>>> On Fri, Feb 20, 2009 at 1:41 PM, Qin Gao <[email protected]> wrote:
>>>> > Hi, James,
>>>> >
>>>> > PGIZA++ is not using OpenMPI, and only use shared storage to transfer
>>>> > model files, that could be a bottleneck, MGIZA++ is just using
>>>> > multi-thread. So they are not quite complete and can be further  
>>>>  improved,
>>>> > the advantage of PGIZA++ is it already decomposed every training step
>>>> > (E/M, model 1, hmm 3,4,5 etc), so it could be easy for you to understand
>>>> > the logic.
>>>> >
>>>> > The bottleneck is majorly  the huge T-Table (translation   
>>>> table), which is
>>>> > larger during model 1 training and then becomes smaller as training goes
>>>> > on, every child has to get the table (typically several gigas   
>>>> for model 1
>>>> > on large data). I think OpenMPI etc is a right way to go, and please let
>>>> > me know if you have any question on PGIZA++.
>>>> >
>>>> > Best,
>>>> > Qin
>>>> >
>>>> > On Thu, Feb 19, 2009 at 4:54 PM, James Read   
>>>> <[email protected]> wrote:
>>>> >> Wow!
>>>> >>
>>>> >> Thanks for that. That was great. I've had a quick read through your
>>>> >> paper. I'm guessing the basis of PGiza++ is OpenMPI calls and the basis
>>>> >> of MGiza++ is OpenMP calls right?
>>>> >>
>>>> >> Your paper was very fascinating. You mentioned I/O bottlenecks quite a
>>>> >> lot with reference to PGiza++ which is to be expected. Did you run any
>>>> >> experiments to find what those bottlenecks typically are? How many
>>>> >> processors did you hit before you started to lose speed up? Did this
>>>> >> number vary for different data sets?
>>>> >>
>>>> >> Also, you mention breaking up the files into chunks and working on them
>>>> >> on different processors. Obviously you're referring to some   
>>>> kind of data
>>>> >> decomposition plan. Does your algorithm have any kind of intelligent
>>>> >> data decomposition strategy for reducing communications? Or is  
>>>>  it just a
>>>> >> case of cutting the file up into n bits and assigning each one to a
>>>> >> processor?
>>>> >>
>>>> >> The reason I ask is that our project would now have to come up  
>>>>  with some
>>>> >> kind of superior data decomposition plan in order to justify proceeding
>>>> >> with the project.
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >> James
>>>> >>
>>>> >> Quoting Qin Gao <[email protected]>:
>>>> >>> Hi James,
>>>> >>>
>>>> >>> The GIZA++ is a very typical EM algorithm and probably you want to
>>>> >>> parallelize the e-step since it takes long time then M-Step. You may
>>>> >>> want to
>>>> >>> check out the PGIZA++ and MGIZA++ implementations which you can
>>>> >>> download in
>>>> >>> my homepage:
>>>> >>>
>>>> >>> http://www.cs.cmu.edu/~qing
>>>> >>>
>>>> >>> And you may also be interested in a paper describing the work:
>>>> >>>
>>>> >>> www.aclweb.org/anthology-new/W/W08/W08-0509.pdf
>>>> >>>
>>>> >>> Please let me know if there are anything I can help.
>>>> >>>
>>>> >>> Best,
>>>> >>> Qin
>>>> >>>
>>>> >>> On Thu, Feb 19, 2009 at 4:12 PM, James Read <[email protected]>
>>>> >>>
>>>> >>> wrote:
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> as the title suggest I am involved in a project which may involve
>>>> >>>> parallelising the code of Giza++ so that it will run on   
>>>> supercomputers
>>>> >>>> scalably on n number of processors. This would have obvious benefits
>>>> >>>> to any researchers making regular use of Giza++ who would like it to
>>>> >>>> finish in minutes rather than hours.
>>>> >>>>
>>>> >>>> The first step of such a project was profiling Giza++ to see  
>>>>  where the
>>>> >>>> executable spends most of its time on a typical run. Such profiling
>>>> >>>> indicated a number of candidate functions. One of which was
>>>> >>>> model1::em_loop found in the model1.cpp file.
>>>> >>>>
>>>> >>>> In order to parallelise such a function (using OpenMPI) it is
>>>> >>>> necessary to first come up with some kind of data decomposition
>>>> >>>> strategy which minimizes the latency of interprocessor communication
>>>> >>>> but ensures that the parallelisation has no side effects other than
>>>> >>>> running faster on a number of processors up to some optimal number of
>>>> >>>> processors where the latency of communication begins to outweigh the
>>>> >>>> benefits of throwing more processors at the job.
>>>> >>>>
>>>> >>>> In order to do this I am trying to gain an understanding of the logic
>>>> >>>> in the model1::em_loop function. However, intuitive comments are
>>>> >>>> lacking in the code. Does anyone on this list have a good internal
>>>> >>>> knowledge of this function? Enough to give a rough outline of the
>>>> >>>> logic it contains in some kind of readable pseudocode?
>>>> >>>>
>>>> >>>> Thanks
>>>> >>>>
>>>> >>>> P.S. Apologies to anybody to whom this email was not of interest.
>>>> >>>>
>>>> >>>> --
>>>> >>>> The University of Edinburgh is a charitable body, registered in
>>>> >>>> Scotland, with registration number SC005336.
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> _______________________________________________
>>>> >>>> Moses-support mailing list
>>>> >>>> [email protected]
>>>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >>>
>>>> >>> --
>>>> >>> ==========================================
>>>> >>> Qin Gao
>>>> >>> Language Technology Institution
>>>> >>> Carnegie Mellon University
>>>> >>> http://geek.kyloo.net
>>>> >>>
>>>> >>>   
>>>> -----------------------------------------------------------------------
>>>> >>>------------- Please help improving NLP articles on Wikipedia
>>>> >>> ==========================================
>>>> >>
>>>> >> --
>>>> >> The University of Edinburgh is a charitable body, registered in
>>>> >> Scotland, with registration number SC005336.
>>>> >
>>>> > --
>>>> > ==========================================
>>>> > Qin Gao
>>>> > Language Technology Institution
>>>> > Carnegie Mellon University
>>>> > http://geek.kyloo.net
>>>> >   
>>>> -------------------------------------------------------------------------
>>>> >----------- Please help improving NLP articles on Wikipedia
>>>> > ==========================================
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Moses-support mailing list
>>>> > [email protected]
>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Parallelising Giza++ for supercomputers

Reply via email to