Hello,

I'm a galaxy newbie and running into several issues trying to adapt an
R script to be a galaxy tool.

I'm looking at the XY plotting tool for guidance
(tools/plot/xy_plot.xml), but I decided not to embed my script in XML,
but instead have it in a separate script file, that way I can still
run it from the command line and make sure it works as I make
incremental changes. (So my script starts with args <-
commandArgs(TRUE)). Also, if it doesn't work, this suggests to me that
there is a problem with my galaxy configuration.

First, I tried using the r_wrapper.sh script that comes with the XY
plotting tool,  but it threw away my arguments:

An error occurred running this job: ARGUMENT
'/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat'
__ignored__

ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat'
__ignored__

ARGUMENT 'Fly' __ignored__

ARGUMENT 'Tagwise' __ignored__

etc.

So then I tried just switching to Rscript:

  <command interpreter="bash">Rscript RNASeq.R $countsTsv $designTsv
"$organism" $dispersion $minimumCountsPerMillion
$minimumSamplesPerTranscript $out_file1 $out_file2</command>

(My script produces as output a csv file and a pdf file. The final two
arguments I'm passing are the names of those files.)

But then I get an error that Rscript can't be found.

So I wrote a little wrapper script, Rscript_wrapper.sh:

#!/bin/sh

Rscript $*

And called that:
  <command interpreter="bash">Rscript_wrapper.sh RNASeq.R $countsTsv
$designTsv "$organism" $dispersion $minimumCountsPerMillion
$minimumSamplesPerTranscript $out_file1 $out_file2</command>

Then I got an error that RNASeq.R could not be found.

So then I added the absolute path to my R script to the <command> tag.
This seemed to work (that is, it got me further, to the next error),
but I'm not sure why I had to do this; in all the other tools I'm
looking at, the directory to the script to run does not have to be
specified; I assumed that the command would run in the appropriate
directory.

So now I've specified the full path to my R script:

  <command interpreter="bash">Rscript_wrapper.sh
/Users/dtenenba/dev/galaxy-dist/tools/bioc/RNASeq.R $countsTsv
$designTsv "$organism" $dispersion $minimumCountsPerMillion
$minimumSamplesPerTranscript $out_file1 $out_file2</command>

And I get the following long error, which includes all of the output
of my R script:

Traceback (most recent call last):
  File "/Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/runners/local.py",
line 133, in run_job
    job_wrapper.finish( stdout, stderr )
  File "/Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/__init__.py",
line 725, in finish
    self.sa_session.flush()
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/scoping.py",
line 127, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py",
line 1356, in flush
    self._flush(objects)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py",
line 1434, in _flush
    flush_context.execute()
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py",
line 261, in execute
    UOWExecutor().execute(self, tasks)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py",
line 753, in execute
    self.execute_save_steps(trans, task)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py",
line 768, in execute_save_steps
    self.save_objects(trans, task)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py",
line 759, in save_objects
    task.mapper._save_obj(task.polymorphic_tosave_objects, trans)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/mapper.py",
line 1413, in _save_obj
    c = connection.execute(statement.values(value_params), params)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py",
line 824, in execute
    return Connection.executors[c](self, object, multiparams, params)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py",
line 874, in _execute_clauseelement
    return self.__execute_context(context)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py",
line 896, in __execute_context
    self._cursor_execute(context.cursor, context.statement,
context.parameters[0], context=context)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py",
line 950, in _cursor_execute
    self._handle_dbapi_exception(e, statement, parameters, cursor, context)
  File 
"/Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py",
line 931, in _handle_dbapi_exception
    raise exc.DBAPIError.instance(statement, parameters, e,
connection_invalidated=is_disconnect)
ProgrammingError: (ProgrammingError) You must not use 8-bit
bytestrings unless you use a text_factory that can interpret 8-bit
bytestrings (like text_factory = str). It is highly recommended that
you instead just switch your application to Unicode strings. u'UPDATE
job SET update_time=?, stdout=?, stderr=? WHERE job.id = ?'
['2012-04-24 18:55:45.791417', '', 'BiocInstaller version 1.5.7,
?biocLite for help\nWarning message:\nNAs introduced by coercion
\nLoading required package: methods\nLoading required package:
limma\nLoading required package: BiasedUrn\nLoading required package:
geneLenDataBase\nLoading required package: org.Dm.eg.db\nLoading
required package: AnnotationDbi\nLoading required package:
BiocGenerics\n\nAttaching package:
\xe2\x80\x98BiocGenerics\xe2\x80\x99\n\nThe following object(s) are
masked from \xe2\x80\x98package:stats\xe2\x80\x99:\n\n    xtabs\n\nThe
following object(s) are masked from
\xe2\x80\x98package:base\xe2\x80\x99:\n\n    anyDuplicated, cbind,
colnames, duplicated, eval, Filter, Find,\n    get, intersect, lapply,
Map, mapply, mget, order, paste, pmax,\n    pmax.int, pmin, pmin.int,
Position, rbind, Reduce, rep.int,\n    rownames, sapply, setdiff,
table, tapply, union, unique\n\nLoading required package:
Biobase\nWelcome to Bioconductor\n\n    Vignettes contain introductory
material; view with\n    \'browseVignettes()\'. To cite Bioconductor,
see\n    \'citation("Biobase")\', and for packages
\'citation("pkgname")\'.\n\nLoading required package:
DBI\n\nCalculating library sizes from column totals.\nError in
matrix(u, nrow = nrows, byrow = TRUE) : \n  negative extents to
matrix\nCalls: plotMDS.DGEList ... equalizeLibSizes -> splitIntoGroups
-> lapply -> FUN -> matrix\nExecution halted\n', 15]

Note that if I run my script from the command line:

./Rscript_wrapper.sh RNASeq.R
/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat
/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat Fly 1
1 Tagwise MDSPlot.pdf outputs.csv

It works fine and does not produce a warning about "NAs introduced by
coercion", nor does it fail with the "Error in matrix" above.

So, can anyone tell me what is going wrong here? Why does R behave
differently in galaxy than it does on the command line? (I'm using the
same instance of R, same machine, for my galaxy and command-line
efforts). Is this 8-bit bytestring error a red herring? Can I filter
it so that galaxy is happy?

Finally, one other curiosity. Every time I hit "Execute" in galaxy to
run my tool, it is run twice--two jobs are created (which each fail in
the same way). Why is this?

My R script:
https://gist.github.com/2482783

My XML file:
https://gist.github.com/2482792

I can share more data (such as sample input files) if necessary.

Thanks for your help.
Dan
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to