Thanks for organizing Lewis,

Here's some topics for discussion I've been noting while working with
Joshua.  None of these are high priority issues for me, but if we are all
in agreement on them it might make sense to log them.

Boring code convention stuff: Logging with log4j, throw Runtime Exceptions
instead of Typed, remove all system exits (replace with RuntimeExceptions),
refactor some large files.

Testing: Integrate existing unit tests, provide some good test examples so
others can begin adding more tests.

Configuration: We also touched on IoC, CLI args, and configuration changes
that are possible.

OO stuff: Joshua is pretty good here, but I would personally prefer more
granular interfaces.  I wouldn't advocate radical changes, but maybe a
little refactoring might make sense to better align with the interface
segregation principle.
https://en.wikipedia.org/wiki/Interface_segregation_principle

JNI reliance:  We've found KenLM works really well with Joshua, but there
is one issue with using it.  It requires many JNI calls during decoding and
these calls impact GC performance.  In fact when a JNI call happens the GC
throws out any work it may have done and quits until the JNI call
completes.  The GC will then resume and start marking objects for
collection from scratch.  This is not ideal especially for programs with
large heaps (Joshua / Spark).  There's a couple ways we could mitigate this
and I think they'd all speed up Joshua quite a lot.

High level roadmap topics:

*  Distributed Decoding is something I'll likely continue working on.
Theres some obvious things we can do given usage patterns of translation
engines that can help us out here (I think).
*  Providing a way to optimize Joshua for low-latency, low-throughput calls
could be interesting for those with near real-time use cases.  Providing a
way to optimize for high-latency, high-throughput could be interesting for
async/batch use cases.
*  The machine learning optimization algorithms could be cleaned up a bit
(MERT/MIRA).
*  The Vocabulary could probably be replaced with a simpler implementation
(without sacrificing performance).

-Kellen



On Thu, May 12, 2016 at 12:32 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Folks,
> Kellen, Henri and I are going to get together tomorrow 13th around
> lunchtime PST to talk everything Joshua.
> Would be great to have others online via GChat if possible.
> Let's say around 11am PST for the time being.
> See you then folks.
> Thanks
> Lewis
>
>
> --
> *Lewis*
>

Reply via email to