Updated the quick start guide
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/commit/9c64de09 Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/tree/9c64de09 Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/diff/9c64de09 Branch: refs/heads/asf-site Commit: 9c64de0919c6ee7ddeefc4b2590ee4cd23d7e016 Parents: afcbef1 Author: Matt Post <[email protected]> Authored: Tue May 19 18:39:15 2015 -0400 Committer: Matt Post <[email protected]> Committed: Tue May 19 18:39:15 2015 -0400 ---------------------------------------------------------------------- 6.0/quick-start.md | 56 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 16 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/9c64de09/6.0/quick-start.md ---------------------------------------------------------------------- diff --git a/6.0/quick-start.md b/6.0/quick-start.md index 1531521..55d025a 100644 --- a/6.0/quick-start.md +++ b/6.0/quick-start.md @@ -3,21 +3,30 @@ layout: default6 title: Quick Start --- -The quickest way to use Joshua is to download a -[pre-built model](/language-packs/) and use them to start translating data. +If you just want to use Joshua to translate data, the quickest way is +to download a [pre-built model](/language-packs/). -Building your own models takes a bit more work, and requires you to -supply parallel data that the models can be trained from. Information -about how to do this can be found in [the pipeline documentation](/6.0/pipeline.html). +If not language pack is available, or if you have your own parallel +data that you want to train the translation engine on, then you have +to build your own model. This takes a bit more knowledge and effort, +but is made easier with Joshua's [pipeline script](pipeline.html), +which runs all the steps of preparing data, aligning it, and +extracting and tuning component models. -Our <a href="pipeline.html">pipeline script</a> is the quickest way to get started. For example, to -train and test a complete model translating from Bengali to English: +Detailed information about running the pipeline can be found in +[the pipeline documentation](/6.0/pipeline.html), but as a quick +start, you can build a simple Bengali--English model by following +these instructions. -First, download the Indian languages data: +*NOTE: We suggest you build models outside the `$JOSHUA` directory*. + +First, download the dataset: - curl -#L https://github.com/joshua-decoder/indian-parallel-corpora/tarball/master > indian-languages.tgz - tar xf indian-languages.tgz - ln -s joshua-decoder-indian-parallel-corpora-* input + mkdir -p ~/models/bn-en/ + cd ~/models/bn-en + curl -L https://github.com/joshua-decoder/indian-parallel-corpora/tarball/master > indian-languages.tgz + tar xf indian-languages.tgz + ln -s joshua-decoder-indian-parallel-corpora-* input Then, train and test a model @@ -27,8 +36,23 @@ Then, train and test a model --tune input/bn-en/tok/dev.bn-en \ --test input/bn-en/tok/devtest.bn-en -This will align the data with the Berkeley aligner, build a Hiero model, tune with MERT, decode the -test sets, and reports results that should correspond with what you find on <a -href="/indian-parallel-corpora/">the Indian Parallel Corpora page</a>. For -more details, including information on the many options available with the pipeline script, please -see <a href="pipeline.html">its documentation page</a>. +This will align the data with the Berkeley aligner, build a Hiero +model, tune with MERT, decode the test sets, and reports results that +should correspond with what you find on +[the Indian Parallel Corpora page](/indian-parallel-corpora/). For +more details, including information on the many options available with +the pipeline script, please see [its documentation page](pipeline.html). + +Finally, you can export the full model as a language pack: + + ./run-bundler.py \ + tune/joshua.config.final \ + language-pack-bn-en \ + --pack-tm grammar.gz + +(or possibly `tune/1/joshua.config.final` if you're using an older version of +the pipeline). + +This will create a [runnable model](bundle.html) in +`language-pack-bn-en`. See the `README` file in that directory for +information on how to run the decoder.
