Re: GSOC 2015 - Taverna: Language command line tool

Stian Soiland-Reyes Wed, 11 Mar 2015 04:16:15 -0700

... Let's try to keep the technical bit of this thread on dev@taverna
- but feel free to contact me personally for private matters such as
your commitments.

On 10 March 2015 at 19:25, Menaka Madushanka <[email protected]> wrote:

> If I understood correctly,
> I have to implement the generalized version of the command line tools by the
> mid evaluation
> https://github.com/apache/incubator-taverna-language/tree/master/taverna-scufl2-examples
> https://github.com/apache/incubator-taverna-language/tree/master/taverna-scufl2-wfdesc
> https://github.com/stain/ro-combine-archive

Right - that sounds like a good plan. Once we have a prototype up and
running we can see better what fits well.

This should include writing some documentation on command line
options, example usage etc.

> After that update ExecuteWorkflow....

Perhaps we should flesh out a more detailed plan for ExecuteWorkflow.

Having some kind of -verbose mode with proper logging was mentioned -
I can help you show how to hook into the platform to get
notifications. Perhaps this needs some kind of verbosity level as
some workflows can be very active and thus be very noisy if everything
is logged.

Have you got any other suggestions? For instance on simplifying the
command line options and perhaps moving some of these to a config
file? Making it a bit more unix-like perhaps.

stain@biggie-utopic:~/src/taverna/incubator-taverna-commandline/taverna-commandline-product/target/apache-taverna-commandline-3.1.0.incubating-SNAPSHOT-dev/apache-taverna-commandline-3.1.0.incubating-SNAPSHOT$
./executeworkflow.sh --help
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=400m;
support was removed in 8.0
usage: executeworkflow [options] [workflow]
-bundle <bundle> Save outputs to a new Workflow
Run Bundle (zip).
-clientserver Connect as a client to a derby
server instance.
-cmdir <directory path> Absolute path to a directory
where Credential Manager's files
(keystore and truststore) are
located.
-cmpassword Indicate that the master password
for Credential Manager will be
provided on standard input.
-dbproperties <filename> Load a properties file to
configure the database.
-embedded Connect to an embedded Derby
database. This can prevent
mulitple invocations.
-help Display comprehensive help
information.
-inmemory Run the workflow with data stored
in-memory rather than in a
database (this is the default
option). This can give
performance inprovements, at the
cost of overall memory usage.
-inputdelimiter <inputname delimiter> Cause an inputvalue or inputfile
to be split into a list according
to the delimiter. The associated
workflow input must be expected
to receive a list.
-inputdoc <document> Load inputs from a Baclava
document.
-inputfile <inputname filename> Load the named input from file or
URL.
-inputvalue <inputname value> Directly use the value for the
named input.
-logfile <filename> The logfile to which more verbose
logging will be written to.
-outputdir <directory> Save outputs as files in
directory, default is to make a
new directory
workflowName_output.
-port <portnumber> The port that the database is
running on. If set requested to
start its own internal server,
this is the start port that will
be used.
-provenance Generate provenance information
and store it in the database.
-startdb Automatically start an internal
Derby database server.
By default, the workflow is executed using the -inmemory option, and the
results are written out to a directory named after the workflow name.

If this directory already exists then a new directory is created, and
appended with _<n>, where n is incremented to the next available index.

Results are written out to files named after the output port for that result.
If a result is composed of lists, then a directory is created for the output
port and individual list items are named after the list element index (with 1
being the first index). The the output is the result of an error, the filename
is appended with '.error'.

You can provide your own output directory with the -outputdir option. There
will be an error if the directory already exists.

You can also record your results to a Baclava document using -outputdoc
option. The document will be overwritten if it already exists.

Inputs can be provided in three ways. Both -inputfile and -inputvalue options
can be used together; -inputdoc option must be used on its own. -inputfile and
-inputvalue options both take two additional arguments, the name of the port
for the input, and either a file containing the input data, or the input value
itself respectively.

If one of more of your workflow inputs is a list, you can create a list
input by using the -inputdelimiter option, which may be used with either
-inputfile or -inputvalue. This option takes two parameters - an input name
and the delimiter by which to split the input into a list.

The delimiter may be a simple character, such as a comma or a new-line
character, or a regular expression. The input string, or file, will then be
converted into a list being split by the delimiter specified. Make sure to
put the delimiter character in quotes as it may be interpreted by the shell
as a special character, e.g. ;.

If a list of greater depth (i.e. a list or lists or deeper) is required then
you will need to use the -inputdoc option. However, if you provide an input
of lower depth to that required, then it will automatically be wrapped in one
or more lists up to the required depth. Providing an input of greater depth
than that required will result in an error.

If a workflow has a high memory requirement, then it may be better to run it
using a database to store data rather than storing it in memory, which is the
default option. There are three options for using a database:

-embedded option, runs with an embedded database. This is slightly faster than
the -clientserver option (below), but has the limitation that only one
executeworkflow script may be executed simultaneously.

-clientserver option allows the workflow to be executed backed by the database
running as a server. By default a database is not started for you, but may be
started using -startdb option.

-startdb option starts a database. It may be used without providing a workflow
to allow a database to be started separately, allowing multiple simultaneous
executeworkflow script runs.

More advanced database configurations can be specified using -dbproperties
option, allowing you to take full control over the database used. This takes a
second argument, the filename of the properties file, for which the following
example contains the default settings:

in_memory = true
provenance = false
connector = derby
port = 1527
dialect = org.hibernate.dialect.DerbyDialect
start_derby = false
driver = org.apache.derby.jdbc.EmbeddedDriver
jdbcuri = jdbc:derby:t2-database;create=true;upgrade=true

Note that when using -dbproperties together with other options, the other
options take precedence.

-cmdir option lets you specify an absolute path to a directory where
Credential Manager's files (keystore and truststore - containing user's
credentials and trusted certificates for accessing secure services) are stored.
If not specified and the workflow requires access to these files, Taverna will
try to find them in the default location in <TAVERNA_HOME>/security somewhere
inside user's home directory (depending on the platform).

-cmpassword option can be used to tell Taverna to expect the password for the
Credential Manager on standard input. If the password is not piped in, Taverna
will prompt you for it in the terminal and block until it is entered. Do not
enter your password in the command line! If -cmpassword option is not specified
and -cmdir option is used, Taverna will try to find the password in a special
file password.txt in the directory specified with -cmdir option.

--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: GSOC 2015 - Taverna: Language command line tool

Reply via email to