Hello,

Our group is investigating using Galaxy as a workflow engine for NLP (Natural 
Language Processing) tasks. I have installed a local Galaxy instance and 
created wrappers for the services we use and so far everything is working 
great.  I do have a few questions and they all fall under the “Advanced Topics” 
 section as defined at the end of the tutorial for creating a Histogram [1]

1. parameter validation: 

Many of our tools rely on additions made by previous tools in the workflow; for 
example, a tool that identifies noun phrases may require that the input has 
been run through a part of speech (POS) tagger, the POS tagger may require that 
the input has been run through a tokenizer, etc.  Our tools can do this 
validation, I am just looking for a way to wire this into Galaxy so a user can 
only connect tools in the workflow editor if this validation passes.

I have been looking looking at the code for 
lib/galaxy/tools/parameters/validation.py and I don’t see anything that I can 
(easily) bend to our use case.  What I was hoping for was something like:

        <input type=“data” format=“our_custom_format” name=“input”>
                <validator type=“dataset_custom”>
                        <command interpreter=“bash”>validate.sh $input</command>
                        <!— OR —>
                        <tool file=“custom_validator.xml”/>
                </validator>
        </input>

I also see the tantalizing sentence, "Custom code execution at various time 
points of the workflow that allows a fine grained control over the execution 
process", but I can't find any examples of how this is done.


2. data repositories / data collections

I need to be able to process collections of data pulled from remote servers. I 
have been looking at DataManagers and data collections in Galaxy, but 
everything seems to assume the data is local to the server, or can be 
copied/uploaded to the server.  For practical and legal reasons beyond my pay 
grade this is not a solution in our case.  For example, an organization may be 
willing to allow our users to query their service for documents, run the 
documents through our workflow, and store the intermediate results; but they 
will not allow us to copy their data to another server verbatim.  There are 
possibilities for me to cache data, but the general use case is that I have to 
call an external service to fetch documents one at a time and then run the same 
workflow on each document.

Any suggestions on how to accomplish this in Galaxy?  I can do single 
documents, I just need to expand this to include collections of documents.  A 
typical workflow might look something like:

a) Query Tool -> Server, find all documents that contain the word “cheese”
b) Server -> Here is the list of document IDs [ id1, id2, …, idn ]
c) WorkFlow -> for each id in the list do
        c1) Download document 
        c2 ) Work work work work…
        c3) Persist output

I can do all of the above except the most important bit; iterating…


3. format conversion:  

Is it possible for Galaxy to automatically convert between formats when 
designing a workflow?  I see the <change_format/> tag, but that seems to change 
the output format of a tool based on the input (or some other condition) in the 
same tool; I need to be able to change the format based on the input 
requirements of the next tool in the workflow. For example, if Tool A produces 
format X, Tool B requires format Y,  and a converter from X to Y has been 
defined in the datatypes_conf.xml; I would like for Galaxy to implicitly insert 
the converter from X to Y when I drag the output noodle from Tool A to Tool B 
in the designer.  Is this possible? 


4. OAuth 2.0 / OpenID Connect: 

I need to be able to fetch documents from data providers that require an OAuth 
2.0 access token. Currently, I use a separate service to go through the OAuth 
authentication/authorization process and then have the user copy/paste their 
access token into a text field in Galaxy.   Is there a way to perform the OAuth 
authentication dance required by the remote service inside Galaxy itself?   
I’ve looked at the Trello site for Galaxy and see that both OAuth 2.0 and 
OpenID Connect are on the radar, hopefully this use case is being considered as 
well.


I’m sure to have more questions after working through some visualization 
examples, but this should keep me busy for now.

Sincerely,
Keith Suderman

REFERENCES

1. https://wiki.galaxyproject.org/Admin/Tools/AddingTools

------------------------------
Research Associate
Department of Computer Science
Vassar College
Poughkeepsie, NY


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to