Very useful. Adding some more resources, available at - http://kbcs.in/tools.html
On Tue, Nov 25, 2014 at 4:33 PM, <[email protected]> wrote: > Send Moses-support mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mailman.mit.edu/mailman/listinfo/moses-support > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Moses-support digest..." > > > Today's Topics: > > 1. SMT resources for Indian languages (Anoop (?????)) > 2. Re: (no subject) (Hieu Hoang) > 3. CFP EAMT 2015: 18th Annual Conference of the European > Association for Machine Translation (Felipe S?nchez Mart?nez) > 4. Re: Too large language models - how to handle that? (Hoang Cuong) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 25 Nov 2014 07:59:46 +0530 > From: Anoop (?????) <[email protected]> > Subject: [Moses-support] SMT resources for Indian languages > To: [email protected] > Message-ID: > < > cadxxmydi98xs8kz6w8c0oevzygb9_faxvb02bl9+-wto9zz...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Sharing a few SMT resources for Indian languages. > > Center For Indian Language Technology <http://www.cfilt.iitb.ac.in>, IIT > Bombay has hosted Shata-Anuvaadak (100 Translators), a Statisitical Machine > Translation system for Indian languages. It currently supports translation > between 11 Indian languages: > > > - Indo-Aryan languages: Hindi, Urdu, Bengali, Gujarati, Punjabi, > Marathi, Konkani > - Dravidian languages: Tamil, Telugu, Malayalam > - English > > > It is a Phrase-Based MT system with pre-processing and post-processing > extensions. The pre-processing includes source-side reordering for English > to Indian language translation. The post-processing includes > transliteration between Indian languages for OOV words. The system can be > accessed at: > > http://www.cfilt.iitb.ac.in/indic-translator > > For more details, see the following publication: > > Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak > Bhattacharyya. 2014. * Shata-Anuvadak: Tackling Multiway Translation of > Indian Languages* . Language and Resources and Evaluation Conference *(LREC > 2014)*. 2014. > > We are also making available software and resources developed in the Center > for the system and for ongoing research. These are available under an open > source license for research use. These include: > > *Software* > > - Indian Language, NLP tools: Common NLP tools for Indian languages that > are useful for machine translation. Unicode Normalizers, Tokenizers, > Morphology-analysers and Transliteration systems. > - Source Side Reodering system for SMT > - A simple experiment management system for Moses > > *Resources* > > - Translation Models for Phrase based SMT systems all language pairs in > Shata-anuvaadak > - Language Models for all language in Shata-anuvaadak > - Transliteration models for some language pairs (Moses-based) > > You can access these resources at: > > http://www.cfilt.iitb.ac.in/static/download.html > > Regards, > Anoop. > > http://www.cse.iitb.ac.in/~anoopk > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/63ea2e27/attachment-0001.htm > > ------------------------------ > > Message: 2 > Date: Tue, 25 Nov 2014 09:10:06 +0000 > From: Hieu Hoang <[email protected]> > Subject: Re: [Moses-support] (no subject) > To: Daramola Olaife <[email protected]>, [email protected], > [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="windows-1252" > > I'm getting a different error when compiling irstlm5.80.06 with the > latest moses from github. > moses/LM/IRST.cpp:60:21: error: invalid use of incomplete type > ?class lmContainer? > if (m_lmtb) m_lmtb->reset_mmap(); > > Using irstlm5.80.03 works fine > http://sourceforge.net/projects/irstlm/files/irstlm/irstlm-5.80/ > > > On 24/11/14 12:50, Daramola Olaife wrote: > > After installing irstlm, I tried linking it to moses with > > ./bjam --with-irstlm=/home/olaife/irstlm-5.80.06 -j8 > > but it was giving me error. > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/d2ea373d/attachment-0001.htm > > ------------------------------ > > Message: 3 > Date: Tue, 25 Nov 2014 10:12:27 +0100 > From: Felipe S?nchez Mart?nez <[email protected]> > Subject: [Moses-support] CFP EAMT 2015: 18th Annual Conference of the > European Association for Machine Translation > To: [email protected], moses-support <[email protected]>, > [email protected], [email protected] > Cc: "awa >> Andy Way" <[email protected]>, "Mikel L. Forcada" > <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > > Apologies for cross-posting. > ----------------------------------------------------------- > > *18th Annual Conference of the European Association for Machine > Translation (EAMT 2015; Antalya, Turkey)* > > The European Association for Machine Translation > (EAMT,http://www.eamt.org) invites everyone interested in machine > translation, translation-related tools and resources to participate in > this conference ? developers, researchers, users, professional > translators and translation/localisation managers: anyone who has a > stake in the vision of an information world in which language barriers > and issues become less visible to the information consumer. We > especially invite researchers to describe the state of the art and > demonstrate their cutting-edge results, and professional MT users to > share their experiences. > > EAMT 2015, the 18th Annual Conference of the European Association for > Machine Translation, will be held in Antalya, Turkey from 11 to 13 May > 2015. > > We expect to receive manuscripts in these three categories: > > ------------------------------------ > Research papers > ------------------------------------ > Long-paper submissions (8 pages) are invited for reports of significant > research results in any aspect of machine translation and related areas. > Such reports should include a substantial evaluation component, or have > a strong theoretical and/or methodological contribution where results > and in-depth evaluations may not be appropriate. Papers are welcome on > all topics in the area of Machine Translation or translation-related > technologies, including: > > * Speech translation: speech to text, speech to speech > * Translation aids (translation memory, terminology databases, etc.) > * Translation environments (workflow, support tools, conversion tools > for lexica, etc.) > * Practical MT systems (MT for professionals, MT for multilingual > eCommerce, MT for localization, etc.) > * MT in multilingual public service (eGovernment etc.) > * MT for the web > * MT embedded in other services > * MT evaluation techniques and evaluation results > * Dictionaries and lexica for MT > * Text and speech corpora for MT > * Standards in text and lexicon encoding for MT > * Human factors in MT and user interfaces > * Related multilingual technologies (natural language generation, > information retrieval, text categorization, text summarization, > information extraction, etc.) > > Papers should describe original work. They should emphasize completed > work rather than intended work, and should indicate clearly the state of > completion of the reported results. Where appropriate, concrete > evaluation results should be included. > > ------------------------------------ > User studies > ------------------------------------ > Short-paper submissions (2-4 pages) are invited for reports on users' > experiences with MT, be it in small or medium size business (SMB), > enterprise, government, or NGOs. Contributions are welcome on: > > * Integrating MT and computer-assisted translation into a translation > production workflow (e.g. transforming terminology glossaries into MT > resources, optimizing TM/MT thresholds, mixing online and offline tools, > using interactive MT, dealing with MT confidence scores); > * Use of MT to improve translation or localization workflows (e.g. > reducing turnaround times, improving translation consistency, increasing > the scope of globalization projects); > * Managing change when implementing and using MT (e.g. switching between > multiple MT systems, limiting degradations when updating or upgrading an > MT system); > * Implementing open-source MT in the SMB or enterprise (e.g. strategies > to get support, reports on taking pilot results into full deployment, > examples of advance customisation sought and obtained thanks to the > open-source paradigm, collaboration within open-source MT projects); > * Evaluation of MT in a real-world setting (e.g. error detection > strategies employed, metrics used, productivity or translation quality > gains achieved); > * Post-editing strategies and tools (e.g. limitations of traditional > translation quality assurance tools, challenges associated with > post-editing guidelines); > * Legal issues associated with MT, especially MT in the cloud (e.g. > copyright, privacy); > * Use of MT in social networking or real-time communication (e.g. > enterprise support chat, multilingual content for social media); > * Use of MT to process multilingual content for assimilation purposes > (e.g. cross-lingual information retrieval, MT for e-discovery or spam > detection, MT for highly dynamic content); > * Use of standards for MT. > > Papers should highlight problems and solutions and not merely describe > MT integration process or project settings. Where solutions do not seem > to exist, suggestions for MT researchers and developers should be > clearly emphasized. For user papers produced by academics, we require > co-authorship with the actual users. > > ------------------------------------ > Project/Product description > ------------------------------------ > Abstract submissions (1 page) are invited to report new, interesting: > > * Tools for machine translation, computer aided translation, and the > like (including commercial products and open-source software). The > authors should be ready to present the tools in the form of demos or > posters during the conference. > * Research projects related to machine translation. The authors should > be ready to present the projects in the form of posters during the > conference. This follows on from the successful ?project villages? held > at the last two EAMT conferences. > > ------------------------------------ > Programme > ------------------------------------ > The programme will include oral presentations and poster sessions. > Accepted papers may be assigned to an oral or poster session, but no > differentiation will be made in the conference proceedings. > > ------------------------------------ > Important Dates > ------------------------------------ > * Paper submission: February 5, 2015 > * Notification to authors: March 12, 2015 > * Camera-ready deadline: April 2, 2015 > * Conference: May 11-13, 2015 > > ------------------------------------ > Conference website > ------------------------------------ > http://www.eamt2015.org/ > > For further information about this call for papers please contact the > track chairs at [email protected] and put in the title "[user]" or > "[research]" depending on which track your question is related to. For > questions about the organisation (venue, registration, accommodation, > etc.) please contact the local organisers at [email protected]. > > Kind regards > -- > Gema Ram?rez-S?nchez, Fred Hollowood and Felipe S?nchez-Mart?nez > on behalf of the EAMT 2015 Organising Committee > > > ------------------------------ > > Message: 4 > Date: Tue, 25 Nov 2014 12:02:32 +0100 > From: Hoang Cuong <[email protected]> > Subject: Re: [Moses-support] Too large language models - how to handle > that? > To: Marcin Junczys-Dowmunt <[email protected]> > Cc: [email protected] > Message-ID: > <CAG1fz7d= > [email protected]> > Content-Type: text/plain; charset="utf-8" > > Hi Raj, Tom and Marcin, > I binarized the ARPA file last night, following your suggestion. In the > end, it resulted a binarized LM file of roughly *100GB* (@Marcin - it is > not 20-30GB as you suggest, is it okay with this size?) > Fortunately, the infrastructure at my university allows me to run > experiments with that. > Thanks a lot for your help. > It is so great to play with such huge LMs :)) > Best, > > > On Mon, Nov 24, 2014 at 3:19 PM, Marcin Junczys-Dowmunt < > [email protected]> > wrote: > > > The command > > > > moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm > > > > will build a compressed binarized model with quantization. You can run > > > > moses/bin/build_binary lm.arpa > > > > without any parameters to get size estimates for different parameter > > settings. I would guess you will get a binarized LM of roughly 20 to 30 > GB > > which is managable (provided the size you gave us is that of an > > uncompressed text file). You can also use lmplz to build pruned models in > > the first place, these will be much smaller. > > > > W dniu 2014-11-24 15:11, Tom Hoar napisa?(a): > > > > After binarizing such a large ARPA file with KenLM, you'll need to > > configure your moses.ini file to "lazily load the model using mmap." This > > involves using lmodel-file code "9" vs code "8." More details here: > > https://kheafield.com/code/kenlm/moses/ > > > > Performance improves significantly if you store the binarized file on an > > SSD. > > > > > > > > > > On 11/24/2014 07:00 PM, Raj Dabre wrote: > > > > Hey Hoang, > > You should binarize the arpa file. > > The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how. > > Regards. > > > > On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]> > > wrote: > > > >> Hi all, > >> I have trained an (unpruned) 5-grams language model on a large corpus of > >> 5 billion words, resulting an ARPA-format file of roughly 300GB (is it a > >> normal LM size with such a big monolingual data?). This is obviously too > >> big for running an SMT system. > >> I read several works where their system uses language models trained on > >> similar monolingual corpus. Could you give me some advice how to handle > >> this, making it feasible to run SMT systems? > >> I appreciate your help a lot, > >> Best, > >> -- > >> Best Regards, > >> Hoang Cuong > >> SMTNerd > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > >> > > > > > > -- > > Raj Dabre. > > Research Student, > > Graduate School of Informatics, > > Kyoto University. > > CSE MTech, IITB., 2011-2014 > > > > > > _______________________________________________ > > Moses-support mailing [email protected]:// > mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > _______________________________________________ > > Moses-support mailing [email protected]:// > mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > -- > > *Best Regards,Hoang CuongSMTNerd* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20141125/439873f3/attachment.htm > > ------------------------------ > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > End of Moses-support Digest, Vol 97, Issue 77 > ********************************************* > -- Regards: राज नाथ पटेल/Raj Nath Patel http://kbcs.in/
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
