Yes, I am. On Tue, Jan 19, 2016, 11:56 AM Mattmann, Chris A (3980) < [email protected]> wrote:
> Thanks Martin, filed: > > https://github.com/joshua-decoder/joshua/issues/239 > > > Are you interested in joining in the Incubation efforts? > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Martin Gainty <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, January 19, 2016 at 7:57 AM > To: "[email protected]" <[email protected]>, Lewis McGibbney > <[email protected]> > Subject: RE: [DISCUSS] Apache Joshua Incubator Proposal - Machine > Translation Toolkit > > >dependency addition to Joshua-Decoder/Joshua/pom.xml > ><!-- MCG added args4j for > >/Joshua-Decoder/Joshua/src/joshua/decoder/ff/tm/CreateGlueGrammar.java:[17 > >,26] package org.kohsuke.args4j does not exist error --> <dependency> > > <groupId>args4j</groupId> <artifactId>args4j</artifactId> > ><version>2.32</version> </dependency> > >Joshua-Decoder/Joshua committer please add this dependency to pom.xml > >thank you/ > >Martin > >______________________________________________ > > > > > > > >> From: [email protected] > >> To: [email protected] > >> CC: [email protected]; [email protected]; [email protected]; > >>[email protected] > >> Subject: Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine > >>Translation Toolkit > >> Date: Tue, 19 Jan 2016 05:58:26 +0000 > >> > >> Great Hen, we’d love to have you on board as a mentor! Please > >> add yourself to the proposal on the wiki. > >> > >> Anyone else have interest in Machine Translation? Any OpenNLP folks, > >> Hadoop folks, Tika, or Lucene folks? CC’ing the dev lists for visibility > >> please feel free to reply to [email protected]. > >> > >> I’ll leave the DISCUSS thread open for a few more days. > >> > >> Cheers, > >> Chris > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: [email protected] > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: Henri Yandell <[email protected]> > >> Reply-To: "[email protected]" <[email protected]> > >> Date: Monday, January 18, 2016 at 7:57 PM > >> To: jpluser <[email protected]>, > >> "[email protected]" <[email protected]> > >> Subject: Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine > >> Translation Toolkit > >> > >> >Non-binding +1 to Joshua joining the Incubator. I'd be interested in > >> >mentoring. > >> > > >> > > >> >> -----Original Message----- > >> >> From: jpluser <[email protected]> > >> >> Reply-To: "[email protected]" > >><[email protected]> > >> >> Date: Tuesday, January 12, 2016 at 10:56 PM > >> >> To: "[email protected]" <[email protected]> > >> >> Cc: "[email protected]" <[email protected]> > >> >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine > >> >>Translation > >> >> Toolkit > >> >> > >> >> >Hi Everyone, > >> >> > > >> >> >Please find attached for your viewing pleasure a proposed new > >>project, > >> >> >Apache Joshua, a statistical machine translation toolkit. The > >>proposal > >> >> >is in wiki draft form at: > >> >> https://wiki.apache.org/incubator/JoshuaProposal > >> >> > > >> >> >Proposal text is copied below. I’ll leave the discussion open for a > >> >>week > >> >> >and we are interested in folks who would like to be initial > >>committers > >> >> >and mentors. Please discuss here on the thread. > >> >> > > >> >> >Thanks! > >> >> > > >> >> >Cheers, > >> >> >Chris (Champion) > >> >> > > >> >> >——— > >> >> > > >> >> >= Joshua Proposal = > >> >> > > >> >> >== Abstract == > >> >> >[[joshua-decoder.org|Joshua]] is an open-source statistical machine > >> >> >translation toolkit. It includes a Java-based decoder for > >>translating > >> >>with > >> >> >phrase-based, hierarchical, and syntax-based translation models, a > >> >> >Hadoop-based grammar extractor (Thrax), and an extensive set of > >>tools > >> >>and > >> >> >scripts for training and evaluating new models from parallel text. > >> >> > > >> >> >== Proposal == > >> >> >Joshua is a state of the art statistical machine translation system > >> >>that > >> >> >provides a number of features: > >> >> > > >> >> > * Support for the two main paradigms in statistical machine > >> >>translation: > >> >> >phrase-based and hierarchical / syntactic. > >> >> > * A sparse feature API that makes it easy to add new feature > >>templates > >> >> >supporting millions of features > >> >> > * Native implementations of many tuners (MERT, MIRA, PRO, and > >>AdaGrad) > >> >> > * Support for lattice decoding, allowing upstream NLP tools to > >>expose > >> >> >their hypothesis space to the MT system > >> >> > * An efficient representation for models, allowing for quick > >>loading > >> >>of > >> >> >multi-gigabyte model files > >> >> > * Fast decoding speed (on par with Moses and mtplz) > >> >> > * Language packs — precompiled models that allow the decoder to be > >> >>run as > >> >> >a black box > >> >> > * Thrax, a Hadoop-based tool for learning translation models from > >> >> >parallel text > >> >> > * A suite of tools for constructing new models for any language > >>pair > >> >>for > >> >> >which sufficient training data exists > >> >> > > >> >> >== Background and Rationale == > >> >> >A number of factors make this a good time for an Apache project > >> >>focused on > >> >> >machine translation (MT): the quality of MT output (for many > >>language > >> >> >pairs); the average computing resources available on computers, > >> >>relative > >> >> >to the needs of MT systems; and the availability of a number of > >> >> >high-quality toolkits, together with a large base of researchers > >> >>working > >> >> >on them. > >> >> > > >> >> >Over the past decade, machine translation (MT; the automatic > >> >>translation > >> >> >of one human language to another) has become a reality. The research > >> >>into > >> >> >statistical approaches to translation that began in the early > >>nineties, > >> >> >together with the availability of large amounts of training data, > >>and > >> >> >better computing infrastructure, have all come together to produce > >> >> >translations results that are “good enough” for a large set of > >>language > >> >> >pairs and use cases. Free services like > >> >> >[[https://www.bing.com/translator|Bing Translator]] and > >> >> >[[https://translate.google.com|Google Translate]] have made these > >> >> services > >> >> >available to the average person through direct interfaces and > >>through > >> >> >tools like browser plugins, and sites across the world with higher > >> >> >translation needs use them to translate their pages through > >> >>automatically. > >> >> > > >> >> >MT does not require the infrastructure of large corporations in > >>order > >> >>to > >> >> >produce feasible output. Machine translation can be > >>resource-intensive, > >> >> >but need not be prohibitively so. Disk and memory usage are mostly a > >> >> >matter of model size, which for most language pairs is a few > >>gigabytes > >> >>at > >> >> >most, at which size models can provide coverage on the order of > >>tens or > >> >> >even hundreds of thousands of words in the input and output > >>languages. > >> >>The > >> >> >computational complexity of the algorithms used to search for > >> >>translations > >> >> >of new sentences are typically linear in the number of words in the > >> >>input > >> >> >sentence, making it possible to run a translation engine on a > >>personal > >> >> >computer. > >> >> > > >> >> >The research community has produced many different open source > >> >>translation > >> >> >projects for a range of programming languages and under a variety of > >> >> >licenses. These projects include the core “decoder”, which takes a > >> >>model > >> >> >and uses it to translate new sentences between the language pair the > >> >>model > >> >> >was defined for. They also typically include a large set of tools > >>that > >> >> >enable new models to be built from large sets of example > >>translations > >> >> >(“parallel data”) and monolingual texts. These toolkits are usually > >> >>built > >> >> >to support the agendas of the (largely) academic researchers that > >>build > >> >> >them: the repeated cycle of building new models, tuning model > >> >>parameters > >> >> >against development data, and evaluating them against held-out test > >> >>data, > >> >> >using standard metrics for testing the quality of MT output. > >> >> > > >> >> >Together, these three factors—the quality of machine translation > >> >>output, > >> >> >the feasibility of translating on standard computers, and the > >> >>availability > >> >> >of tools to build models—make it reasonable for the end users to use > >> >>MT as > >> >> >a black-box service, and to run it on their personal machine. > >> >> > > >> >> >These factors make it a good time for an organization with the > >>status > >> >>of > >> >> >the Apache Foundation to host a machine translation project. > >> >> > > >> >> >== Current Status == > >> >> >Joshua was originally ported from David Chiang’s Python > >>implementation > >> >>of > >> >> >Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins > >> >> >University. The current version is maintained by Matt Post at Johns > >> >> >Hopkins’ Human Language Technology Center of Excellence. Joshua has > >> >>made > >> >> >many releases with a list of over 20 source code tags. The last > >> >>release of > >> >> >Joshua was 6.0.5 on November 5th, 2015. > >> >> > > >> >> >== Meritocracy == > >> >> >The current developers are familiar with meritocratic open source > >> >> >development at Apache. Apache was chosen specifically because we > >>want > >> >>to > >> >> >encourage this style of development for the project. > >> >> > > >> >> >== Community == > >> >> >Joshua is used widely across the world. Perhaps its biggest (known) > >> >> >research / industrial user is the Amazon research group in Berlin. > >> >>Another > >> >> >user is the US Army Research Lab. No formal census has been > >>undertaken, > >> >> >but posts to the Joshua technical support mailing list, along with > >>the > >> >> >occasional contributions, suggest small research and academic > >> >>communities > >> >> >spread across the world, many of them in India. > >> >> > > >> >> >During incubation, we will explicitly seek to increase our usage > >>across > >> >> >the board, including academic research, industry, and other end > >>users > >> >> >interested in statistical machine translation. > >> >> > > >> >> >== Core Developers == > >> >> >The current set of core developers is fairly small, having fallen > >>with > >> >>the > >> >> >graduation from Johns Hopkins of some core student participants. > >> >>However, > >> >> >Joshua is used fairly widely, as mentioned above, and there remains > >>a > >> >> >commitment from the principal researcher at Johns Hopkins to > >>continue > >> >>to > >> >> >use and develop it. Joshua has seen a number of new community > >>members > >> >> >become interested recently due to a potential for its projected use > >>in > >> >>a > >> >> >number of ongoing DARPA projects such as XDATA and Memex. > >> >> > > >> >> >== Alignment == > >> >> >Joshua is currently Copyright (c) 2015, Johns Hopkins University All > >> >> >rights reserved and licensed under BSD 2-clause license. It would of > >> >> >course be the intention to relicense this code under AL2.0 which > >>would > >> >> >permit expanded and increased use of the software within Apache > >> >>projects. > >> >> >There is currently an ongoing effort within the Apache Tika > >>community > >> >>to > >> >> >utilize Joshua within Tika’s Translate API, see > >> >> >[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. > >> >> > > >> >> >== Known Risks == > >> >> > > >> >> >=== Orphaned products === > >> >> >At the moment, regular contributions are made by a single > >>contributor, > >> >>the > >> >> >lead maintainer. He (Matt Post) plans to continue development for > >>the > >> >>next > >> >> >few years, but it is still a single point of failure, since the > >> >>graduate > >> >> >students who worked on the project have moved on to jobs, mostly in > >> >> >industry. However, our goal is to help that process by growing the > >> >> >community in Apache, and at least in growing the community with > >>users > >> >>and > >> >> >participants from NASA JPL. > >> >> > > >> >> >=== Inexperience with Open Source === > >> >> >The team both at Johns Hopkins and NASA JPL have experience with > >>many > >> >>OSS > >> >> >software projects at Apache and elsewhere. We understand "how it > >>works" > >> >> >here at the foundation. > >> >> > > >> >> > > >> >> >== Relationships with Other Apache Products == > >> >> >Joshua includes dependences on Hadoop, and also is included as a > >> >>plugin in > >> >> >Apache Tika. We are also interested in coordinating with other > >>projects > >> >> >including Spark, and other projects needing MT services for language > >> >> >translation. > >> >> > > >> >> >== Developers == > >> >> >Joshua only has one regular developer who is employed by Johns > >>Hopkins > >> >> >University. NASA JPL (Mattmann and McGibbney) have been contributing > >> >> >lately including a Brew formula and other contributions to the > >>project > >> >> >through the DARPA XDATA and Memex programs. > >> >> > > >> >> >== Documentation == > >> >> >Documentation and publications related to Joshua can be found at > >> >> >joshua-decoder.org. The source for the Joshua documentation is > >> >>currently > >> >> >hosted on Github at > >> >> >https://github.com/joshua-decoder/joshua-decoder.github.com > >> >> > > >> >> >== Initial Source == > >> >> >Current source resides at Github: github.com/joshua-decoder/joshua > >>(the > >> >> >main decoder and toolkit) and github.com/joshua-decoder/thrax (the > >> >> grammar > >> >> >extraction tool). > >> >> > > >> >> >== External Dependencies == > >> >> >Joshua has a number of external dependencies. Only BerkeleyLM > >>(Apache > >> >>2.0) > >> >> >and KenLM (LGPG 2.1) are run-time decoder dependencies (one of > >>which is > >> >> >needed for translating sentences with pre-built models). The rest > >>are > >> >> >dependencies for the build system and pipeline, used for > >>constructing > >> >>and > >> >> >training new models from parallel text. > >> >> > > >> >> >Apache projects: > >> >> > * Ant > >> >> > * Hadoop > >> >> > * Commons > >> >> > * Maven > >> >> > * Ivy > >> >> > > >> >> >There are also a number of other open-source projects with various > >> >> >licenses that the project depends on both dynamically (runtime), and > >> >> >statically. > >> >> > > >> >> >=== GNU GPL 2 === > >> >> > * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ > >> >> > > >> >> >=== LGPG 2.1 === > >> >> > * KenLM: github.com/kpu/kenlm > >> >> > > >> >> >=== Apache 2.0 === > >> >> > * BerkeleyLM: https://code.google.com/p/berkeleylm/ > >> >> > > >> >> >=== GNU GPL === > >> >> > * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html > >> >> > > >> >> >== Required Resources == > >> >> > * Mailing Lists > >> >> > * [email protected] > >> >> > * [email protected] > >> >> > * [email protected] > >> >> > > >> >> > * Git Repos > >> >> > * https://git-wip-us.apache.org/repos/asf/joshua.git > >> >> > > >> >> > * Issue Tracking > >> >> > * JIRA Joshua (JOSHUA) > >> >> > > >> >> > * Continuous Integration > >> >> > * Jenkins builds on https://builds.apache.org/ > >> >> > > >> >> > * Web > >> >> > * http://joshua.incubator.apache.org/ > >> >> > * wiki at http://cwiki.apache.org > >> >> > > >> >> >== Initial Committers == > >> >> >The following is a list of the planned initial Apache committers > >>(the > >> >> >active subset of the committers for the current repository on > >>Github). > >> >> > > >> >> > * Matt Post ([email protected]) > >> >> > * Lewis John McGibbney ([email protected]) > >> >> > * Chris Mattmann ([email protected]) > >> >> > > >> >> >== Affiliations == > >> >> > > >> >> > * Johns Hopkins University > >> >> > * Matt Post > >> >> > > >> >> > * NASA JPL > >> >> > * Chris Mattmann > >> >> > * Lewis John McGibbney > >> >> > > >> >> > > >> >> >== Sponsors == > >> >> >=== Champion === > >> >> > * Chris Mattmann (NASA/JPL) > >> >> > > >> >> >=== Nominated Mentors === > >> >> > * Paul Ramirez > >> >> > * Lewis John McGibbney > >> >> > * Chris Mattmann > >> >> > > >> >> >== Sponsoring Entity == > >> >> >The Apache Incubator > >> >> > > >> >> > > >> >> > > >> >> > > >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> >Chris Mattmann, Ph.D. > >> >> >Chief Architect > >> >> >Instrument Software and Science Data Systems Section (398) > >> >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >> >Office: 168-519, Mailstop: 168-527 > >> >> >Email: [email protected] > >> >> >WWW: http://sunset.usc.edu/~mattmann/ > >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> >Adjunct Associate Professor, Computer Science Department > >> >> >University of Southern California, Los Angeles, CA 90089 USA > >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> > > >> >> > > >> >> > > >> >> > >> > >>>>>?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK > >>>>>KC > >> >>>B� > >> >> > >> > >>>>>?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�ܙ�B��܈?Y??]? > >>>>>[ۘ > >> >>>[? > >> >> >?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > >
