Thanks for organizing Lewis, Here's some topics for discussion I've been noting while working with Joshua. None of these are high priority issues for me, but if we are all in agreement on them it might make sense to log them.
Boring code convention stuff: Logging with log4j, throw Runtime Exceptions instead of Typed, remove all system exits (replace with RuntimeExceptions), refactor some large files. Testing: Integrate existing unit tests, provide some good test examples so others can begin adding more tests. Configuration: We also touched on IoC, CLI args, and configuration changes that are possible. OO stuff: Joshua is pretty good here, but I would personally prefer more granular interfaces. I wouldn't advocate radical changes, but maybe a little refactoring might make sense to better align with the interface segregation principle. https://en.wikipedia.org/wiki/Interface_segregation_principle JNI reliance: We've found KenLM works really well with Joshua, but there is one issue with using it. It requires many JNI calls during decoding and these calls impact GC performance. In fact when a JNI call happens the GC throws out any work it may have done and quits until the JNI call completes. The GC will then resume and start marking objects for collection from scratch. This is not ideal especially for programs with large heaps (Joshua / Spark). There's a couple ways we could mitigate this and I think they'd all speed up Joshua quite a lot. High level roadmap topics: * Distributed Decoding is something I'll likely continue working on. Theres some obvious things we can do given usage patterns of translation engines that can help us out here (I think). * Providing a way to optimize Joshua for low-latency, low-throughput calls could be interesting for those with near real-time use cases. Providing a way to optimize for high-latency, high-throughput could be interesting for async/batch use cases. * The machine learning optimization algorithms could be cleaned up a bit (MERT/MIRA). * The Vocabulary could probably be replaced with a simpler implementation (without sacrificing performance). -Kellen On Thu, May 12, 2016 at 12:32 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Folks, > Kellen, Henri and I are going to get together tomorrow 13th around > lunchtime PST to talk everything Joshua. > Would be great to have others online via GChat if possible. > Let's say around 11am PST for the time being. > See you then folks. > Thanks > Lewis > > > -- > *Lewis* >
