[
https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299141#comment-15299141
]
Matt Post commented on JOSHUA-270:
----------------------------------
The pipeline is a huge mess, probably not worth salvaging. I'm hoping (maybe
this year?) to rewrite it, perhaps using this:
https://github.com/jhclark/ducttape/
> pipeline.pl needs major refactoring
> -----------------------------------
>
> Key: JOSHUA-270
> URL: https://issues.apache.org/jira/browse/JOSHUA-270
> Project: Joshua
> Issue Type: Bug
> Components: pipeline
> Affects Versions: 6.0.5
> Reporter: Lewis John McGibbney
> Fix For: 6.1
>
>
> Right now
> [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl]
> is well over 2000 lines long and extremely difficult to navigate.
> I propose the following
> * All ENV is refactored into an pipeline_environment file
> * All Command line parsing and definitions are refactored into a
> pipeline_cli file
> * Sanity checking is refactored into a pipeline_sanity_check file
> * Dependenct Variable Checking is refactored into
> pipeline_dependent_variable_setting file
> * filter and preprocess corpora is refactored into
> pipeline_filter_preprocess_corpora
> * pipeline_subsampling becomes a file
> * pipeline_alignment becomes a file
> * pipeline_parsing becomes a file
> * pipeline_thrax becomes a file
> * pipeline_tuning becomes a file
> * pipeline_testing becomes a file
> * pipeline_subreoutines becomes a file
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)