Repository: incubator-joshua
Updated Branches:
  refs/heads/master 621edeada -> 97c155f4f


Fix: memory overflow in tuning

when larger language models are used, the tuner needs more memory.
Even though we allocate more memory in the training pipeline script
the value is ignored at one place in pipeline.

This patch fixes it by exporting env var, and passing it to the JVM


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/cad86aa7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/cad86aa7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/cad86aa7

Branch: refs/heads/master
Commit: cad86aa7512f2829c8a964f2ccc3f69c7cd2f1c7
Parents: 621edea
Author: Thamme Gowda <t...@isi.edu>
Authored: Mon Mar 5 17:13:27 2018 -0800
Committer: Thamme Gowda <t...@isi.edu>
Committed: Mon Mar 5 17:13:27 2018 -0800

----------------------------------------------------------------------
 scripts/training/pipeline.pl  | 2 ++
 scripts/training/run_tuner.py | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/cad86aa7/scripts/training/pipeline.pl
----------------------------------------------------------------------
diff --git a/scripts/training/pipeline.pl b/scripts/training/pipeline.pl
index 4c6380c..b057dac 100755
--- a/scripts/training/pipeline.pl
+++ b/scripts/training/pipeline.pl
@@ -135,6 +135,8 @@ my $FILTERING = "fast";
 # a lot more than this for SAMT decoding (though really it depends
 # mostly on your grammar size)
 my $JOSHUA_MEM = "4g";
+# export the environment var
+$ENV{'JOSHUA_MEM'} = $JOSHUA_MEM;
 
 # the amount of memory available for hadoop processes (passed to
 # Hadoop via -Dmapred.child.java.opts

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/cad86aa7/scripts/training/run_tuner.py
----------------------------------------------------------------------
diff --git a/scripts/training/run_tuner.py b/scripts/training/run_tuner.py
index 38059fd..d548aee 100755
--- a/scripts/training/run_tuner.py
+++ b/scripts/training/run_tuner.py
@@ -348,7 +348,10 @@ def get_features(config_file):
     """Queries the decoder for all dense features that will be fired by the 
feature
     functions activated in the config file"""
 
-    output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % 
(JOSHUA, config_file), shell=True)
+    mem_size = os.environ.get('JOSHUA_MEM', None)
+    mem_arg = '-m %s' % mem_size if mem_size else ''
+    decode_cmd = "%s/bin/joshua-decoder %s -c %s -show-weights -v 0" % 
(JOSHUA, mem_arg, config_file)
+    output = check_output(decode_cmd, shell=True)
     features = []
     for index, item in enumerate(output.split('\n'.encode(encoding='utf_8', 
errors='strict'))):
         item = item.decode()

Reply via email to