About logging. Are you proposing to use log4j interface in the code? I would recommend to use slf4j [1]
[ On Thu, May 12, 2016 at 2:30 PM, kellen sunderland < [email protected]> wrote: > Thanks for organizing Lewis, > > Here's some topics for discussion I've been noting while working with > Joshua. None of these are high priority issues for me, but if we are all > in agreement on them it might make sense to log them. > > Boring code convention stuff: Logging with log4j, throw Runtime Exceptions > instead of Typed, remove all system exits (replace with RuntimeExceptions), > refactor some large files. > > Testing: Integrate existing unit tests, provide some good test examples so > others can begin adding more tests. > > Configuration: We also touched on IoC, CLI args, and configuration changes > that are possible. > > OO stuff: Joshua is pretty good here, but I would personally prefer more > granular interfaces. I wouldn't advocate radical changes, but maybe a > little refactoring might make sense to better align with the interface > segregation principle. > https://en.wikipedia.org/wiki/Interface_segregation_principle > > JNI reliance: We've found KenLM works really well with Joshua, but there > is one issue with using it. It requires many JNI calls during decoding and > these calls impact GC performance. In fact when a JNI call happens the GC > throws out any work it may have done and quits until the JNI call > completes. The GC will then resume and start marking objects for > collection from scratch. This is not ideal especially for programs with > large heaps (Joshua / Spark). There's a couple ways we could mitigate this > and I think they'd all speed up Joshua quite a lot. > > High level roadmap topics: > > * Distributed Decoding is something I'll likely continue working on. > Theres some obvious things we can do given usage patterns of translation > engines that can help us out here (I think). > * Providing a way to optimize Joshua for low-latency, low-throughput calls > could be interesting for those with near real-time use cases. Providing a > way to optimize for high-latency, high-throughput could be interesting for > async/batch use cases. > * The machine learning optimization algorithms could be cleaned up a bit > (MERT/MIRA). > * The Vocabulary could probably be replaced with a simpler implementation > (without sacrificing performance). > > -Kellen > > > > On Thu, May 12, 2016 at 12:32 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi Folks, > > Kellen, Henri and I are going to get together tomorrow 13th around > > lunchtime PST to talk everything Joshua. > > Would be great to have others online via GChat if possible. > > Let's say around 11am PST for the time being. > > See you then folks. > > Thanks > > Lewis > > > > > > -- > > *Lewis* > > >
