Wow! Thanks for that. That was great. I've had a quick read through your paper. I'm guessing the basis of PGiza++ is OpenMPI calls and the basis of MGiza++ is OpenMP calls right?
Your paper was very fascinating. You mentioned I/O bottlenecks quite a lot with reference to PGiza++ which is to be expected. Did you run any experiments to find what those bottlenecks typically are? How many processors did you hit before you started to lose speed up? Did this number vary for different data sets? Also, you mention breaking up the files into chunks and working on them on different processors. Obviously you're referring to some kind of data decomposition plan. Does your algorithm have any kind of intelligent data decomposition strategy for reducing communications? Or is it just a case of cutting the file up into n bits and assigning each one to a processor? The reason I ask is that our project would now have to come up with some kind of superior data decomposition plan in order to justify proceeding with the project. Thanks James Quoting Qin Gao <[email protected]>: > Hi James, > > The GIZA++ is a very typical EM algorithm and probably you want to > parallelize the e-step since it takes long time then M-Step. You may want to > check out the PGIZA++ and MGIZA++ implementations which you can download in > my homepage: > > http://www.cs.cmu.edu/~qing > > And you may also be interested in a paper describing the work: > > www.aclweb.org/anthology-new/W/W08/W08-0509.pdf > > Please let me know if there are anything I can help. > > Best, > Qin > > On Thu, Feb 19, 2009 at 4:12 PM, James Read <[email protected]> wrote: > >> Hi all, >> >> as the title suggest I am involved in a project which may involve >> parallelising the code of Giza++ so that it will run on supercomputers >> scalably on n number of processors. This would have obvious benefits >> to any researchers making regular use of Giza++ who would like it to >> finish in minutes rather than hours. >> >> The first step of such a project was profiling Giza++ to see where the >> executable spends most of its time on a typical run. Such profiling >> indicated a number of candidate functions. One of which was >> model1::em_loop found in the model1.cpp file. >> >> In order to parallelise such a function (using OpenMPI) it is >> necessary to first come up with some kind of data decomposition >> strategy which minimizes the latency of interprocessor communication >> but ensures that the parallelisation has no side effects other than >> running faster on a number of processors up to some optimal number of >> processors where the latency of communication begins to outweigh the >> benefits of throwing more processors at the job. >> >> In order to do this I am trying to gain an understanding of the logic >> in the model1::em_loop function. However, intuitive comments are >> lacking in the code. Does anyone on this list have a good internal >> knowledge of this function? Enough to give a rough outline of the >> logic it contains in some kind of readable pseudocode? >> >> Thanks >> >> P.S. Apologies to anybody to whom this email was not of interest. >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > -- > ========================================== > Qin Gao > Language Technology Institution > Carnegie Mellon University > http://geek.kyloo.net > ------------------------------------------------------------------------------------ > Please help improving NLP articles on Wikipedia > ========================================== > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
