On May 27, 2012, at 9:43 PM, Wanmei wrote:
> Hi All,
> I am pretty new to Galaxy. I would like to understand Galaxy's versioning
> capability from an end-user perspective (i do not mean the versioning
> capability that Mercurial offers in Galaxy repo).
> I did some research and found the following link mentioned a use case: if an
> end-user would like to rerun an analysis which was previously run using a
> different version of the analysis program, then Galaxy will prompt the
> end-user whether he/she would like to proceed with the new analysis. From
> this screenshot (in the link), it looks like Galaxy keeps track of the
> metadata of a output data file such as which analysis program and which
> version of the code produce it. is my understanding correct?
You are correct. In Galaxy, the process of providing an input dataset to an
analysis tool creates a Galaxy job. This job keeps information about the tool
that was used, including the version. The results of the job running is the
analysis consisting of one or more additional datasets. At some later point
when a Galaxy user attempts to rerun the job, the original job information is
inspected to determine the tool / version combination that was used in the job.
Then the current Galaxy tool box is inspected to see if that tool / version
combination is available in the tool box or if a derivative tool / version
combination is available, allowing the user to rerun the tool with either the
original or the derivative.
> If the answer to my above question is yes, then i have one more question.
> does Galaxy version the output data as well? What i means is, for example, if
> the end-user agrees to use a newer version of the code to rerun (answer Yes
> to Galaxy's prompt), will the newly generated output be marked as version #2
> as oppose to the original output (version #1)? Or it will just simply
> overwrites the previous analysis output file?
With each new analysis, new datasets are produced. In no case are previous
datasets overwritten. With the new analysis in your example, the job again has
information about the tool / version combination that produced the dataset.
So, like I described above, the job can be rerun at some later point. The
resulting dataset is not versioned in the way you describe, but information is
kept about the analysis process that produced the resulting dataset.
Greg Von Kuster
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: