Yes, I am.

On Tue, Jan 19, 2016, 11:56 AM Mattmann, Chris A (3980) <
[email protected]> wrote:

> Thanks Martin, filed:
>
> https://github.com/joshua-decoder/joshua/issues/239
>
>
> Are you interested in joining in the Incubation efforts?
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Martin Gainty <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Tuesday, January 19, 2016 at 7:57 AM
> To: "[email protected]" <[email protected]>, Lewis McGibbney
> <[email protected]>
> Subject: RE: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> Translation Toolkit
>
> >dependency addition to Joshua-Decoder/Joshua/pom.xml
> ><!-- MCG added args4j for
> >/Joshua-Decoder/Joshua/src/joshua/decoder/ff/tm/CreateGlueGrammar.java:[17
> >,26] package org.kohsuke.args4j does not exist error -->    <dependency>
> >   <groupId>args4j</groupId>     <artifactId>args4j</artifactId>
> ><version>2.32</version>    </dependency>
> >Joshua-Decoder/Joshua committer please add this dependency to pom.xml
> >thank you/
> >Martin
> >______________________________________________
> >
> >
> >
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]; [email protected]; [email protected];
> >>[email protected]
> >> Subject: Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >>Translation Toolkit
> >> Date: Tue, 19 Jan 2016 05:58:26 +0000
> >>
> >> Great Hen, we’d love to have you on board as a mentor! Please
> >> add yourself to the proposal on the wiki.
> >>
> >> Anyone else have interest in Machine Translation? Any OpenNLP folks,
> >> Hadoop folks, Tika, or Lucene folks? CC’ing the dev lists for visibility
> >> please feel free to reply to [email protected].
> >>
> >> I’ll leave the DISCUSS thread open for a few more days.
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: [email protected]
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Henri Yandell <[email protected]>
> >> Reply-To: "[email protected]" <[email protected]>
> >> Date: Monday, January 18, 2016 at 7:57 PM
> >> To: jpluser <[email protected]>,
> >> "[email protected]" <[email protected]>
> >> Subject: Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >> Translation Toolkit
> >>
> >> >Non-binding +1 to Joshua joining the Incubator. I'd be interested in
> >> >mentoring.
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: jpluser <[email protected]>
> >> >> Reply-To: "[email protected]"
> >><[email protected]>
> >> >> Date: Tuesday, January 12, 2016 at 10:56 PM
> >> >> To: "[email protected]" <[email protected]>
> >> >> Cc: "[email protected]" <[email protected]>
> >> >> Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
> >> >>Translation
> >> >> Toolkit
> >> >>
> >> >> >Hi Everyone,
> >> >> >
> >> >> >Please find attached for your viewing pleasure a proposed new
> >>project,
> >> >> >Apache Joshua, a statistical machine translation toolkit. The
> >>proposal
> >> >> >is in wiki draft form at:
> >> >> https://wiki.apache.org/incubator/JoshuaProposal
> >> >> >
> >> >> >Proposal text is copied below. I’ll leave the discussion open for a
> >> >>week
> >> >> >and we are interested in folks who would like to be initial
> >>committers
> >> >> >and mentors. Please discuss here on the thread.
> >> >> >
> >> >> >Thanks!
> >> >> >
> >> >> >Cheers,
> >> >> >Chris (Champion)
> >> >> >
> >> >> >———
> >> >> >
> >> >> >= Joshua Proposal =
> >> >> >
> >> >> >== Abstract ==
> >> >> >[[joshua-decoder.org|Joshua]] is an open-source statistical machine
> >> >> >translation toolkit. It includes a Java-based decoder for
> >>translating
> >> >>with
> >> >> >phrase-based, hierarchical, and syntax-based translation models, a
> >> >> >Hadoop-based grammar extractor (Thrax), and an extensive set of
> >>tools
> >> >>and
> >> >> >scripts for training and evaluating new models from parallel text.
> >> >> >
> >> >> >== Proposal ==
> >> >> >Joshua is a state of the art statistical machine translation system
> >> >>that
> >> >> >provides a number of features:
> >> >> >
> >> >> > * Support for the two main paradigms in statistical machine
> >> >>translation:
> >> >> >phrase-based and hierarchical / syntactic.
> >> >> > * A sparse feature API that makes it easy to add new feature
> >>templates
> >> >> >supporting millions of features
> >> >> > * Native implementations of many tuners (MERT, MIRA, PRO, and
> >>AdaGrad)
> >> >> > * Support for lattice decoding, allowing upstream NLP tools to
> >>expose
> >> >> >their hypothesis space to the MT system
> >> >> > * An efficient representation for models, allowing for quick
> >>loading
> >> >>of
> >> >> >multi-gigabyte model files
> >> >> > * Fast decoding speed (on par with Moses and mtplz)
> >> >> > * Language packs — precompiled models that allow the decoder to be
> >> >>run as
> >> >> >a black box
> >> >> > * Thrax, a Hadoop-based tool for learning translation models from
> >> >> >parallel text
> >> >> > * A suite of tools for constructing new models for any language
> >>pair
> >> >>for
> >> >> >which sufficient training data exists
> >> >> >
> >> >> >== Background and Rationale ==
> >> >> >A number of factors make this a good time for an Apache project
> >> >>focused on
> >> >> >machine translation (MT): the quality of MT output (for many
> >>language
> >> >> >pairs); the average computing resources available on computers,
> >> >>relative
> >> >> >to the needs of MT systems; and the availability of a number of
> >> >> >high-quality toolkits, together with a large base of researchers
> >> >>working
> >> >> >on them.
> >> >> >
> >> >> >Over the past decade, machine translation (MT; the automatic
> >> >>translation
> >> >> >of one human language to another) has become a reality. The research
> >> >>into
> >> >> >statistical approaches to translation that began in the early
> >>nineties,
> >> >> >together with the availability of large amounts of training data,
> >>and
> >> >> >better computing infrastructure, have all come together to produce
> >> >> >translations results that are “good enough” for a large set of
> >>language
> >> >> >pairs and use cases. Free services like
> >> >> >[[https://www.bing.com/translator|Bing Translator]] and
> >> >> >[[https://translate.google.com|Google Translate]] have made these
> >> >> services
> >> >> >available to the average person through direct interfaces and
> >>through
> >> >> >tools like browser plugins, and sites across the world with higher
> >> >> >translation needs use them to translate their pages through
> >> >>automatically.
> >> >> >
> >> >> >MT does not require the infrastructure of large corporations in
> >>order
> >> >>to
> >> >> >produce feasible output. Machine translation can be
> >>resource-intensive,
> >> >> >but need not be prohibitively so. Disk and memory usage are mostly a
> >> >> >matter of model size, which for most language pairs is a few
> >>gigabytes
> >> >>at
> >> >> >most, at which size models can provide coverage on the order of
> >>tens or
> >> >> >even hundreds of thousands of words in the input and output
> >>languages.
> >> >>The
> >> >> >computational complexity of the algorithms used to search for
> >> >>translations
> >> >> >of new sentences are typically linear in the number of words in the
> >> >>input
> >> >> >sentence, making it possible to run a translation engine on a
> >>personal
> >> >> >computer.
> >> >> >
> >> >> >The research community has produced many different open source
> >> >>translation
> >> >> >projects for a range of programming languages and under a variety of
> >> >> >licenses. These projects include the core “decoder”, which takes a
> >> >>model
> >> >> >and uses it to translate new sentences between the language pair the
> >> >>model
> >> >> >was defined for. They also typically include a large set of tools
> >>that
> >> >> >enable new models to be built from large sets of example
> >>translations
> >> >> >(“parallel data”) and monolingual texts. These toolkits are usually
> >> >>built
> >> >> >to support the agendas of the (largely) academic researchers that
> >>build
> >> >> >them: the repeated cycle of building new models, tuning model
> >> >>parameters
> >> >> >against development data, and evaluating them against held-out test
> >> >>data,
> >> >> >using standard metrics for testing the quality of MT output.
> >> >> >
> >> >> >Together, these three factors—the quality of machine translation
> >> >>output,
> >> >> >the feasibility of translating on standard computers, and the
> >> >>availability
> >> >> >of tools to build models—make it reasonable for the end users to use
> >> >>MT as
> >> >> >a black-box service, and to run it on their personal machine.
> >> >> >
> >> >> >These factors make it a good time for an organization with the
> >>status
> >> >>of
> >> >> >the Apache Foundation to host a machine translation project.
> >> >> >
> >> >> >== Current Status ==
> >> >> >Joshua was originally ported from David Chiang’s Python
> >>implementation
> >> >>of
> >> >> >Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins
> >> >> >University. The current version is maintained by Matt Post at Johns
> >> >> >Hopkins’ Human Language Technology Center of Excellence. Joshua has
> >> >>made
> >> >> >many releases with a list of over 20 source code tags. The last
> >> >>release of
> >> >> >Joshua was 6.0.5 on November 5th, 2015.
> >> >> >
> >> >> >== Meritocracy ==
> >> >> >The current developers are familiar with meritocratic open source
> >> >> >development at Apache. Apache was chosen specifically because we
> >>want
> >> >>to
> >> >> >encourage this style of development for the project.
> >> >> >
> >> >> >== Community ==
> >> >> >Joshua is used widely across the world. Perhaps its biggest (known)
> >> >> >research / industrial user is the Amazon research group in Berlin.
> >> >>Another
> >> >> >user is the US Army Research Lab. No formal census has been
> >>undertaken,
> >> >> >but posts to the Joshua technical support mailing list, along with
> >>the
> >> >> >occasional contributions, suggest small research and academic
> >> >>communities
> >> >> >spread across the world, many of them in India.
> >> >> >
> >> >> >During incubation, we will explicitly seek to increase our usage
> >>across
> >> >> >the board, including academic research, industry, and other end
> >>users
> >> >> >interested in statistical machine translation.
> >> >> >
> >> >> >== Core Developers ==
> >> >> >The current set of core developers is fairly small, having fallen
> >>with
> >> >>the
> >> >> >graduation from Johns Hopkins of some core student participants.
> >> >>However,
> >> >> >Joshua is used fairly widely, as mentioned above, and there remains
> >>a
> >> >> >commitment from the principal researcher at Johns Hopkins to
> >>continue
> >> >>to
> >> >> >use and develop it. Joshua has seen a number of new community
> >>members
> >> >> >become interested recently due to a potential for its projected use
> >>in
> >> >>a
> >> >> >number of ongoing DARPA projects such as XDATA and Memex.
> >> >> >
> >> >> >== Alignment ==
> >> >> >Joshua is currently Copyright (c) 2015, Johns Hopkins University All
> >> >> >rights reserved and licensed under BSD 2-clause license. It would of
> >> >> >course be the intention to relicense this code under AL2.0 which
> >>would
> >> >> >permit expanded and increased use of the software within Apache
> >> >>projects.
> >> >> >There is currently an ongoing effort within the Apache Tika
> >>community
> >> >>to
> >> >> >utilize Joshua within Tika’s Translate API, see
> >> >> >[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]].
> >> >> >
> >> >> >== Known Risks ==
> >> >> >
> >> >> >=== Orphaned products ===
> >> >> >At the moment, regular contributions are made by a single
> >>contributor,
> >> >>the
> >> >> >lead maintainer. He (Matt Post) plans to continue development for
> >>the
> >> >>next
> >> >> >few years, but it is still a single point of failure, since the
> >> >>graduate
> >> >> >students who worked on the project have moved on to jobs, mostly in
> >> >> >industry. However, our goal is to help that process by growing the
> >> >> >community in Apache, and at least in growing the community with
> >>users
> >> >>and
> >> >> >participants from NASA JPL.
> >> >> >
> >> >> >=== Inexperience with Open Source ===
> >> >> >The team both at Johns Hopkins and NASA JPL have experience with
> >>many
> >> >>OSS
> >> >> >software projects at Apache and elsewhere. We understand "how it
> >>works"
> >> >> >here at the foundation.
> >> >> >
> >> >> >
> >> >> >== Relationships with Other Apache Products ==
> >> >> >Joshua includes dependences on Hadoop, and also is included as a
> >> >>plugin in
> >> >> >Apache Tika. We are also interested in coordinating with other
> >>projects
> >> >> >including Spark, and other projects needing MT services for language
> >> >> >translation.
> >> >> >
> >> >> >== Developers ==
> >> >> >Joshua only has one regular developer who is employed by Johns
> >>Hopkins
> >> >> >University. NASA JPL (Mattmann and McGibbney) have been contributing
> >> >> >lately including a Brew formula and other contributions to the
> >>project
> >> >> >through the DARPA XDATA and Memex programs.
> >> >> >
> >> >> >== Documentation ==
> >> >> >Documentation and publications related to Joshua can be found at
> >> >> >joshua-decoder.org. The source for the Joshua documentation is
> >> >>currently
> >> >> >hosted on Github at
> >> >> >https://github.com/joshua-decoder/joshua-decoder.github.com
> >> >> >
> >> >> >== Initial Source ==
> >> >> >Current source resides at Github: github.com/joshua-decoder/joshua
> >>(the
> >> >> >main decoder and toolkit) and github.com/joshua-decoder/thrax (the
> >> >> grammar
> >> >> >extraction tool).
> >> >> >
> >> >> >== External Dependencies ==
> >> >> >Joshua has a number of external dependencies. Only BerkeleyLM
> >>(Apache
> >> >>2.0)
> >> >> >and KenLM (LGPG 2.1) are run-time decoder dependencies (one of
> >>which is
> >> >> >needed for translating sentences with pre-built models). The rest
> >>are
> >> >> >dependencies for the build system and pipeline, used for
> >>constructing
> >> >>and
> >> >> >training new models from parallel text.
> >> >> >
> >> >> >Apache projects:
> >> >> > * Ant
> >> >> > * Hadoop
> >> >> > * Commons
> >> >> > * Maven
> >> >> > * Ivy
> >> >> >
> >> >> >There are also a number of other open-source projects with various
> >> >> >licenses that the project depends on both dynamically (runtime), and
> >> >> >statically.
> >> >> >
> >> >> >=== GNU GPL 2 ===
> >> >> > * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/
> >> >> >
> >> >> >=== LGPG 2.1 ===
> >> >> > * KenLM: github.com/kpu/kenlm
> >> >> >
> >> >> >=== Apache 2.0 ===
> >> >> > * BerkeleyLM: https://code.google.com/p/berkeleylm/
> >> >> >
> >> >> >=== GNU GPL ===
> >> >> > * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html
> >> >> >
> >> >> >== Required Resources ==
> >> >> > * Mailing Lists
> >> >> >   * [email protected]
> >> >> >   * [email protected]
> >> >> >   * [email protected]
> >> >> >
> >> >> > * Git Repos
> >> >> >   * https://git-wip-us.apache.org/repos/asf/joshua.git
> >> >> >
> >> >> > * Issue Tracking
> >> >> >   * JIRA Joshua (JOSHUA)
> >> >> >
> >> >> > * Continuous Integration
> >> >> >   * Jenkins builds on https://builds.apache.org/
> >> >> >
> >> >> > * Web
> >> >> >   * http://joshua.incubator.apache.org/
> >> >> >   * wiki at http://cwiki.apache.org
> >> >> >
> >> >> >== Initial Committers ==
> >> >> >The following is a list of the planned initial Apache committers
> >>(the
> >> >> >active subset of the committers for the current repository on
> >>Github).
> >> >> >
> >> >> > * Matt Post ([email protected])
> >> >> > * Lewis John McGibbney ([email protected])
> >> >> > * Chris Mattmann ([email protected])
> >> >> >
> >> >> >== Affiliations ==
> >> >> >
> >> >> > * Johns Hopkins University
> >> >> >   * Matt Post
> >> >> >
> >> >> > * NASA JPL
> >> >> >   * Chris Mattmann
> >> >> >   * Lewis John McGibbney
> >> >> >
> >> >> >
> >> >> >== Sponsors ==
> >> >> >=== Champion ===
> >> >> > * Chris Mattmann (NASA/JPL)
> >> >> >
> >> >> >=== Nominated Mentors ===
> >> >> > * Paul Ramirez
> >> >> > * Lewis John McGibbney
> >> >> > * Chris Mattmann
> >> >> >
> >> >> >== Sponsoring Entity ==
> >> >> >The Apache Incubator
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >Chris Mattmann, Ph.D.
> >> >> >Chief Architect
> >> >> >Instrument Software and Science Data Systems Section (398)
> >> >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> >Office: 168-519, Mailstop: 168-527
> >> >> >Email: [email protected]
> >> >> >WWW:  http://sunset.usc.edu/~mattmann/
> >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >Adjunct Associate Professor, Computer Science Department
> >> >> >University of Southern California, Los Angeles, CA 90089 USA
> >> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>>?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> >>>>>KC
> >> >>>B�
> >> >>
> >>
> >>>>>?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�ܙ�B��܈?Y??]?
> >>>>>[ۘ
> >> >>>[?
> >> >> >?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >
>
>

Reply via email to