[GitHub] incubator-joshua pull request: Performance Improvements to Joshua

KellenSunderland Sat, 02 Apr 2016 09:33:52 -0700

GitHub user KellenSunderland opened a pull request:

    https://github.com/apache/incubator-joshua/pull/1


    Performance Improvements to Joshua

    Hello Joshua folks.  I've got a series of patches to contribute to Joshua 
including for the most part a lot of performance improvements.  
    
    **Performance Improvements**
    The two main changes in terms of performance are that we've removed some 
string parsing that was not strictly needed, and slowed decoding.  We've also 
added a post-deserialization LRU cache for objects that are commonly requested 
from the packed grammar during decoding.  Because the same objects are 
requested over and over again from the packed grammar this reduces the cost of 
unpacking them, allocating new objects, calculating their feature weights, etc. 
 It also dramatically reduces the work the garbage collector needs to do during 
decoding.  
    
    **New Features**
    In terms of new features we do add a small new feature that we will further 
develop in future commits.  Namely the ability to extract information via a 
StructuredTranslation object that provides similar information to what is 
displayed to stdout.  This means if you are using Joshua as a library, the 
consuming application will have programatic access to investing aspects to the 
translate call, for example what the alignments in this translation were.
    
    **New Dependancies**
    These initial few patches add a new dependency, Google Guava 19.
    http://mvnrepository.com/artifact/com.google.guava/guava/19.0
    https://github.com/google/guava
    LICENSE: https://github.com/google/guava/blob/master/COPYING (Apache 2.0)
    
    **Testing**
    We've also got a number of unit tests that we'll be contributing.  In this 
pull request there's only two, but there should be more to come in further 
pulls.  Currently these tests don't have a build target, but we can add that in 
the future if you (the Joshua maintainers) would find them beneficial.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KellenSunderland/joshua master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-joshua/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1
    
----
commit 9501535dcd67b89e821fd686089f621c5721497f
Author: Felix Hieber <[email protected]>
Date:   2015-02-27T13:05:50Z

    Decoder's Translation class now contains more members including the 
possibility to store word alignment from the derivation. Allows use of Joshua 
decoder class in a larger code project to extract information, rather than 
relying on stdout. Also added a getter for JoshuaConfiguration in the Decoder.

commit 244e6936d8e3e7b30ebbe49ff7a9a2bd0c0c9994
Author: Felix Hieber <[email protected]>
Date:   2015-08-24T06:29:17Z

    Viterbi information is now extracted from the hypergraph using a more 
principled traversel functionality (WalkerFunction).
    Also updated the unit tests.

commit e70677d2eab23daa7082173e6fe337d68aa12230
Author: Kellen Sunderland <[email protected]>
Date:   2015-09-22T11:37:54Z

    Add an LRU cache from Google Guava to decrease allocations in the 
PackedGrammer getRules() call
    Results in a 1.5 times speedup in decoding and a large decrease in required 
garbage collection

commit cabb52cabd5a81088b21b9e01a4668ebb2a85ffa
Author: Kellen Sunderland <[email protected]>
Date:   2015-10-14T15:13:18Z

    kellens: Use Guava's memoize for expensive calls, removed unneeded members
    fhieber: Important bugfix for obtaining word alignments from packedRules in 
multi-threading environment

commit 5665f02ff0385db4f77bf4493db2d96bc63355d8
Author: Felix Hieber <[email protected]>
Date:   2015-12-01T12:34:47Z

    Removed slow and redundant feature string parsing when constructing rules 
from packed grammar (at sort time and at actual construction of feature vector).
    
    Gets rid of String parsing features over and over again which turned out to 
be slow in profiling. The solution is not perfect, but we get a nice speedup of 
roughly a factor 5: If JoshuaConfiguration.amortize is set to false grammars 
are forced to be sorted at decoder startup. Here are the stats: New code: Took 
561.64 seconds to load pipeline. Old code: Took 2688.60 seconds to load 
pipeline.  Basically we are significantly reducing the time for sorting the 
rules by getting rid of an intermediate string representation of the features 
in a rule. Since String parsing of floats is removed now there was some float 
precision change in the regression-test for which I changed the gold output. 
This is fine.

commit cadd987c16ff012298b42074fb96bab8697fa84f
Author: Kellen Sunderland <[email protected]>
Date:   2016-03-29T13:46:27Z

    Forced synchronization on method that still occasionally fails 
multi-threaded test.
    This is a fix for a very rare multithreading issue we've observed in 
Joshua. We have a test that is able to reproduce the error fairly often when 
run on a host with multiple physical cores.  This patch fixes all errors seen 
in both the patch and during runtime.

commit 2cc9996b4ed9e71ae4998a0db3eaef9586b0c69d
Author: Felix Hieber <[email protected]>
Date:   2016-03-29T14:55:07Z

    Remove sorting which may rely on LOCALE of machine
    
    Here we fixed an integration test that will fail on any machine with a 
console locale set to a euro-based numbering system. So for example de-DE and 
fr-FR locales would fail this test.

commit 9448ba552cd03bacad81eb4b9b5e900db360c00e
Author: Kellen Sunderland <[email protected]>
Date:   2016-03-29T15:23:23Z

    Clean up Slice constructor, Fully loading source tries, lazy loading other 
structures

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-joshua pull request: Performance Improvements to Joshua

Reply via email to