On May 28, 2012, at 2:03 PM, Wanmei wrote:

> Thank you again, Greg.
> 
> Thank you for the link too (however the image in the link is not accessible 
> anymore). In your example, 4366992 is an auto generated number appended when 
> a user uploads a data file into the Galaxy instance, right? say if the user 
> upload dataset.dat at time t1 and another dataset.dat at time t2, the new 
> file does not over write the old one as the newly generated auto number is 
> different from the previous one. is that correct?

Yes, that is correct.  At no time do datasets overwrite datasets that were 
previously generated.


> 
> Wanmei
> 
> 
> From: Greg Von Kuster <g...@bx.psu.edu>
> To: Wanmei <wanmei...@yahoo.com> 
> Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu> 
> Sent: Monday, May 28, 2012 9:26 AM
> Subject: Re: [galaxy-dev] Data file and Analysis Program versioning
> 
> 
> On May 28, 2012, at 8:34 AM, Wanmei wrote:
> 
>> Thank you Greg.
>> 
>> > This job keeps information about the tool that was used, including the 
>> > version.  The results of the job running is the analysis consisting of one 
>> > or more additional datasets.
>> [Wanmei] does the job also keep the information about which input dataset is 
>> used besides the tool&version?
> 
> Yes, the Galaxy reports component ( discussed in the Galaxy news brief at 
> http://wiki.g2.bx.psu.edu/DevNewsBriefs/2010%2006_08 ) is a good place to 
> look for details about Galaxy jobs.  The reports have not yet been enhanced 
> to display tool version information or tool version relationships, but they 
> will soon include this information.  Here is some of the job information 
> shown in the current reports.  You'll see information about input datasets 
> and resulting datasets in the command line.
> 
> Job Information
> 
> State Job Id  Create Time     Time To Finish  Session Id
> ok    3865189 2012-05-28 00:00:56.419746      0:00:34 5531371
> Tool  User    Runner  Runner Id
> Filter1       xxxxxx  pbs://torque.g2.bx.psu.edu/     
> 2305392.thumper.g2.bx.psu.edu
> Remote Host
> xxx.xxx.xxx.xxx
> Command Line
> python /galaxy/home/g2main/galaxy_main/tools/stats/filtering.py 
> /galaxy/main_pool/pool1/files/004/366/dataset_4366992.dat 
> /galaxy/main_pool/pool5/tmp/job_working_directory/003/865/3865189/galaxy_dataset_4366996.dat
>  "c3!=__sq__No results__sq__" 30 
> "str,str,str,str,int,float,str,float,str,str,int,float,str,str,int,str,str,str,str,str,str,int,str,str,str,list,str,list,str,str"
> Stdout
> Filtering with c3!='No results', 
> kept 46.58% of 1241 valid lines (1241 total lines).
> Stderr
> Stack Trace
> None
> Info
> None
> 
>> 
>> > With each new analysis, new datasets are produced.  In no case are 
>> > previous datasets overwritten.  With the new analysis in your example, the 
>> > job again has information about the tool / version combination that 
>> > produced the dataset.  So, like I described above, the job can be rerun at 
>> > some later point.  The resulting dataset is not versioned in the way you 
>> > describe, but information is kept about the analysis process that produced 
>> > the resulting dataset.
>> [Wanmei] I think you mean this for the example we discussed: Galaxy will 
>> keep two separate jobs: Job#1 is the previous analysis with the 
>> corresponding tool/version/output dataset; Job#2 is the new analysis with 
>> the corresponding tool/version/output dataset. Is my understanding correct?
> 
> Yes!
> 
> 
>> 
>> 
>> Thanks,
>> Wanmei
>> 
>> From: Greg Von Kuster <g...@bx.psu.edu>
>> To: Wanmei <wanmei...@yahoo.com> 
>> Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu> 
>> Sent: Monday, May 28, 2012 7:13 AM
>> Subject: Re: [galaxy-dev] Data file and Analysis Program versioning
>> 
>> Hello Wanmei,
>> 
>> On May 27, 2012, at 9:43 PM, Wanmei wrote:
>> 
>>> Hi All,
>>> 
>>> I am pretty new to Galaxy. I would like to understand Galaxy's versioning 
>>> capability from an end-user perspective (i do not mean the versioning 
>>> capability that Mercurial offers in Galaxy repo).
>>> 
>>> I did some research and found the following link mentioned a use case: if 
>>> an end-user would like to rerun an analysis which was previously run using 
>>> a different version of the analysis program, then Galaxy will prompt the 
>>> end-user whether he/she would like to proceed with the new analysis. From 
>>> this screenshot (in the link), it looks like Galaxy keeps track of the 
>>> metadata of a output data file such as which analysis program and which 
>>> version of the code produce it. is my understanding correct?
>>> http://wiki.g2.bx.psu.edu/Tool%20Shed#Galaxy_Tool_Versions
>> 
>> You are correct.  In Galaxy, the process of providing an input dataset to an 
>> analysis tool creates a Galaxy job.  This job keeps information about the 
>> tool that was used, including the version.  The results of the job running 
>> is the analysis consisting of one or more additional datasets.  At some 
>> later point when a Galaxy user attempts to rerun the job, the original job 
>> information is inspected to determine the tool / version combination that 
>> was used in the job.  Then the current Galaxy tool box is inspected to see 
>> if that tool / version combination is available in the tool box or if a 
>> derivative tool / version combination is available, allowing the user to 
>> rerun the tool with either the original or the derivative.
>> 
>>> 
>>> If the answer to my above question is yes, then i have one more question. 
>>> does Galaxy version the output data as well? What i means is, for example, 
>>> if the end-user agrees to use a newer version of the code to rerun (answer 
>>> Yes to Galaxy's prompt), will the newly generated output be marked as 
>>> version #2 as oppose to the original output (version #1)? Or it will just 
>>> simply overwrites the previous analysis output file?
>> 
>> With each new analysis, new datasets are produced.  In no case are previous 
>> datasets overwritten.  With the new analysis in your example, the job again 
>> has information about the tool / version combination that produced the 
>> dataset.  So, like I described above, the job can be rerun at some later 
>> point.  The resulting dataset is not versioned in the way you describe, but 
>> information is kept about the analysis process that produced the resulting 
>> dataset.
>> 
>> Greg Von Kuster
>> 
>>> 
>>> 
>>> Thanks,
>>> Wanmei
>>> 
>>> 
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>> 
>>>  http://lists.bx.psu.edu/
>> 
>> 
>> 
> 
> 
> 

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to