GitHub user fhieber opened a pull request:

    https://github.com/apache/incubator-joshua/pull/41

    Major refactoring of features/rules/grammars in Joshua

    Major refactoring of core decoder components (Rule.java, FeatureVector.java 
and grammars). The core idea of this change is to simplify feature handling 
inside Joshua. Please note that this change is NOT backwards compatible. The 
following changes were made:
    
    - No distinction between sparse and dense features inside the decoder 
anymore. Each feature stored at the rule is 'owned' by the grammar that 
contains the rule. An 'owned' feature simply means that its name is prepended 
with the owner string: 0=0.2 becomes <owner>_0=0.2. This applies to both dense 
features (features that occur at every rule), as well as sparse features. 
Please note that the old prefix 'tm_' is no longer used.
    - Having only one type of feature, a revised version of FeatureVector.java 
was built that is greatly simplified. It is basically a HashMap of FeatureId 
(typed as ints) to feature values. FeatureIds are created/hashed by the new 
global mapping FeatureMap.java, which maintains a bidirectional mapping between 
feature ids and feature names. This also allowed getting rid of storing feature 
names in the vocabulary.
    - The simplified FeatureVector cause removal of all 
'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the 
grammar interface.
    - The tradition but very obscure way of flipping the sign of dense features 
but not sparse features was removed. The feature value in the decoder is now 
just the value as you see it stored at the rule.
    - The Rule class was changed to adhere to object-oriented principles. It 
now has only one constructor that requires all of its dependencies and these 
can not be changed later. This forces Rule creators to finalize the 
dependencies (deciding on an owner of the rule and the hashing of the feature 
vector).
    - Also the unused concept of the precomputableCost in a rule was removed. 
Rules still 'cache' their estimated cost.
    - The various Grammar and MemoryBasedBatchGrammar constructors were unified 
and a lot of old obscure code was removed.
    - Due to the change above, the PhraseModel feature function that fires 
feature values for features stored at rules is greatly simplified.
    - As featureVectors at Rules are final and have to have an owner, feature 
sharing across multiple grammars would need to be handled by a separate feature 
function implementation which is transparent.
    
    This commit also updates all existing (and enabled) Unit tests which also 
pass. Existing regression tests do NOT work in this commit since many of the 
grammars are packed and would need to be re-packed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhieber/incubator-joshua 7_features

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-joshua/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #41
    
----
commit 20afddf85263f3def242f721728ac148ef143ad5
Author: Felix Hieber <[email protected]>
Date:   2016-08-17T11:52:39Z

    Major refactoring of core decoder components (Rule.java, FeatureVector.java 
and grammars). The core idea of this change is to simplify feature handling 
inside Joshua. Please note that this change is NOT backwards compatible. The 
following changes were made:
    - No distinction between sparse and dense features inside the decoder 
anymore. Each feature stored at the rule is 'owned' by the grammar that 
contains the rule. An 'owned' feature simply means that its name is prepended 
with the owner string: 0=0.2 becomes <owner>_0=0.2. This applies to both dense 
features (features that occur at every rule), as well as sparse features. 
Please note that the old prefix 'tm_' is no longer used.
    - Having only one type of feature, a revised version of FeatureVector.java 
was built that is greatly simplified. It is basically a HashMap of FeatureId 
(typed as ints) to feature values. FeatureIds are created/hashed by the new 
global mapping FeatureMap.java, which maintains a bidirectional mapping between 
feature ids and feature names. This also allowed getting rid of storing feature 
names in the vocabulary.
    - The simplified FeatureVector cause removal of all 
'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the 
grammar interface.
    - The tradition but very obscure way of flipping the sign of dense features 
but not sparse features was removed. The feature value in the decoder is now 
just the value as you see it stored at the rule.
    - The Rule class was changed to adhere to object-oriented principles. It 
now has only one constructor that requires all of its dependencies and these 
can not be changed later. This forces Rule creators to finalize the 
dependencies (deciding on an owner of the rule and the hashing of the feature 
vector).
    - Also the unused concept of the precomputableCost in a rule was removed. 
Rules still 'cache' their estimated cost.
    - The various Grammar and MemoryBasedBatchGrammar constructors were unified 
and a lot of old obscure code was removed.
    - Due to the change above, the PhraseModel feature function that fires 
feature values for features stored at rules is greatly simplified.
    - As featureVectors at Rules are final and have to have an owner, feature 
sharing across multiple grammars would need to be handled by a separate feature 
function implementation which is transparent.
    
    This commit also updates all existing (and enabled) Unit tests which also 
pass. Existing regression tests do NOT work in this commit since many of the 
grammars are packed and would need to be re-packed.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to