I found the reference for that 1,000,000 number a bit too late -- according
to this more recent paper from Koehn, it's more like 15,000,000 tokens for
NMT to meet phrase-based MT, and they omit syntax-based.

https://arxiv.org/pdf/1706.03872.pdf

-John

On Sun, Jul 2, 2017 at 12:38 PM, John Hewitt <john...@seas.upenn.edu> wrote:

> I've talked with the ModernMT people; they're well aware that they're in a
> neural MT world, and they also know that there's a sizable market for
> non-neural MT solutions.
> To back this up -- Philipp Koehn gave a talk in March on comparing
> phrase-based, syntax-based, and neural MT in low-resource settings, that
> is, when the amount of bilingual text to train on is small.
>
> Neural MT needs (if I remember correctly) about 1,000,000 tokens of
> training data to outpace syntax-based MT.
> Many language pairs (and, for that matter, domains within a single
> language pair) do not meet that requirement, and in those cases
> syntax-based MT performs best.
>
> That being said, there are some cool opportunities to combine neural and
> syntax-based MT. I can't commit the work hours right now, necessarily, but
> I've worked with xnmt <https://github.com/neulab/xnmt>, an MIT-licensed
> neural MT library that is purpose-built to be highly modular. It may offer
> some good opportunities to make an ensemble system.
>
> On Sun, Jul 2, 2017 at 4:22 AM, Tommaso Teofili <tommaso.teof...@gmail.com
> > wrote:
>
>> I think it's interesting as it extends some features that also Joshua has,
>> it's open source and has good results compared with NMT.
>>
>> Tommaso
>>
>> Il giorno sab 1 lug 2017 alle ore 18:56 Suneel Marthi <
>> suneel.mar...@gmail.com> ha scritto:
>>
>> > Is this the latest/greatest paper around MT @tommaso ??
>> >
>> > On Sat, Jul 1, 2017 at 7:55 AM, Tommaso Teofili <
>> tommaso.teof...@gmail.com
>> > >
>> > wrote:
>> >
>> > > I accidentally found the paper about mmt [1]
>> > >
>> > > [1] :
>> > > https://ufal.mff.cuni.cz/eamt2017/user-project-product-
>> > > papers/papers/user/EAMT2017_paper_88.pdf
>> > >
>> > > Il giorno gio 1 dic 2016 alle ore 22:19 Mattmann, Chris A (3010) <
>> > > chris.a.mattm...@jpl.nasa.gov> ha scritto:
>> > >
>> > > > Guys I want to point you at the DARPA D3M program:
>> > > >
>> > > > http://www.darpa.mil/program/data-driven-discovery-of-models
>> > > >
>> > > > I’m part of the Government Team for the program. This will be a good
>> > > > connection
>> > > > to have b/c it’s focused on automatically doing model and code
>> building
>> > > > for ML based
>> > > > approaches.
>> > > >
>> > > >
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > > Chris Mattmann, Ph.D.
>> > > > Principal Data Scientist, Engineering Administrative Office (3010)
>> > > > Manager, Open Source Projects Formulation and Development Office
>> (8212)
>> > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > > Office: 180-503E, Mailstop: 180-503
>> > > > Email: chris.a.mattm...@nasa.gov
>> > > > WWW:  http://sunset.usc.edu/~mattmann/
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > > Director, Information Retrieval and Data Science Group (IRDS)
>> > > > Adjunct Associate Professor, Computer Science Department
>> > > > University of Southern California, Los Angeles, CA 90089 USA
>> > > > WWW: http://irds.usc.edu/
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >
>> > > >
>> > > > On 12/1/16, 1:15 PM, "Matt Post" <p...@cs.jhu.edu> wrote:
>> > > >
>> > > >     John,
>> > > >
>> > > >     Thanks for sharing, this is really helpful. I didn't realize
>> that
>> > > > Marcello was involved.
>> > > >
>> > > >     I think we can identify with the NMT danger. I still think there
>> > is a
>> > > > big niche that deep learning approaches won't reach for a few years,
>> > > until
>> > > > GPUs become super prevalent. Which is why I like ModernMT's
>> approaches,
>> > > > which overlap with many of the things I've been thinking. One thing
>> I
>> > > > really like is there automatic context-switching approach. This is a
>> > > great
>> > > > way to build general-purpose models, and I'd like to mimic it. I
>> have
>> > > some
>> > > > general ideas about how this should be implemented but am also
>> looking
>> > > into
>> > > > the literature here.
>> > > >
>> > > >     matt
>> > > >
>> > > >
>> > > >     > On Dec 1, 2016, at 1:46 PM, John Hewitt <
>> john...@seas.upenn.edu>
>> > > > wrote:
>> > > >     >
>> > > >     > I had a few good conversations over dinner with this team at
>> AMTA
>> > > in
>> > > > Austin
>> > > >     > in October.
>> > > >     > They seem to be in the interesting position where their work
>> is
>> > > > good, but
>> > > >     > is in danger of being superseded by neural MT as they come
>> out of
>> > > > the gate.
>> > > >     > Clearly, it has benefits over NMT, and is easier to adopt, but
>> > may
>> > > > not be
>> > > >     > the winner over the long run.
>> > > >     >
>> > > >     > Here's the link
>> > > >     > <
>> > > > https://amtaweb.org/wp-content/uploads/2016/11/MMT_
>> > > Tutorial_FedericoTrombetti_wide-cover.pdf
>> > > > >
>> > > >     > to their AMTA tutorial.
>> > > >     >
>> > > >     > -John
>> > > >     >
>> > > >     > On Thu, Dec 1, 2016 at 10:17 AM, Mattmann, Chris A (3010) <
>> > > >     > chris.a.mattm...@jpl.nasa.gov> wrote:
>> > > >     >
>> > > >     >> Wow seems like this kind of overlaps with BigTranslate as
>> well..
>> > > > thanks
>> > > >     >> for passing
>> > > >     >> along Matt
>> > > >     >>
>> > > >     >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > ++++++
>> > > >     >> Chris Mattmann, Ph.D.
>> > > >     >> Principal Data Scientist, Engineering Administrative Office
>> > (3010)
>> > > >     >> Manager, Open Source Projects Formulation and Development
>> Office
>> > > > (8212)
>> > > >     >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > >     >> Office: 180-503E, Mailstop: 180-503
>> > > >     >> Email: chris.a.mattm...@nasa.gov
>> > > >     >> WWW:  http://sunset.usc.edu/~mattmann/
>> > > >     >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > ++++++
>> > > >     >> Director, Information Retrieval and Data Science Group (IRDS)
>> > > >     >> Adjunct Associate Professor, Computer Science Department
>> > > >     >> University of Southern California, Los Angeles, CA 90089 USA
>> > > >     >> WWW: http://irds.usc.edu/
>> > > >     >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > ++++++
>> > > >     >>
>> > > >     >>
>> > > >     >> On 12/1/16, 4:47 AM, "Matt Post" <p...@cs.jhu.edu> wrote:
>> > > >     >>
>> > > >     >>    Just came across this, and it's really cool:
>> > > >     >>
>> > > >     >>        https://github.com/ModernMT/MMT
>> > > >     >>
>> > > >     >>    See the README for some great use cases. I'm surprised I'd
>> > > never
>> > > > heard
>> > > >     >> of this before as it's EU funded and associated with U
>> > Edinburgh.
>> > > >     >>
>> > > >     >>
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to