Thank you again, Greg.

Thank you for the link too (however the image in the link is not accessible 
anymore). In your example, 4366992 is an auto generated number appended when a 
user uploads a data file into the Galaxy instance, right? say if the user 
upload dataset.dat at time t1 and another dataset.dat at time t2, the new file 
does not over write the old one as the newly generated auto number is different 
from the previous one. is that correct?

Wanmei




________________________________
 From: Greg Von Kuster <g...@bx.psu.edu>
To: Wanmei <wanmei...@yahoo.com> 
Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu> 
Sent: Monday, May 28, 2012 9:26 AM
Subject: Re: [galaxy-dev] Data file and Analysis Program versioning
 



On May 28, 2012, at 8:34 AM, Wanmei wrote:

Thank you Greg.
>
>
>> This job keeps information about the tool that was used, including the 
version.  The results of the job running is the analysis consisting of 
one or more additional datasets.
>[Wanmei] does the job also keep the information about which input dataset is 
>used besides the tool&version?
Yes, the Galaxy reports component ( discussed in the Galaxy news brief at 
http://wiki.g2.bx.psu.edu/DevNewsBriefs/2010%2006_08 ) is a good place to look 
for details about Galaxy jobs.  The reports have not yet been enhanced to 
display tool version information or tool version relationships, but they will 
soon include this information.  Here is some of the job information shown in 
the current reports.  You'll see information about input datasets and resulting 
datasets in the command line.

Job Information
State Job Id Create Time Time To Finish Session Id 
ok 3865189 2012-05-28 00:00:56.419746 0:00:34 5531371 
Tool User Runner Runner Id 
Filter1 xxxxxx pbs://torque.g2.bx.psu.edu/ 2305392.thumper.g2.bx.psu.edu 
Remote Host 
xxx.xxx.xxx.xxx 
Command Line 
python /galaxy/home/g2main/galaxy_main/tools/stats/filtering.py 
/galaxy/main_pool/pool1/files/004/366/dataset_4366992.dat 
/galaxy/main_pool/pool5/tmp/job_working_directory/003/865/3865189/galaxy_dataset_4366996.dat
 "c3!=__sq__No results__sq__" 30 
"str,str,str,str,int,float,str,float,str,str,int,float,str,str,int,str,str,str,str,str,str,int,str,str,str,list,str,list,str,str"
 
Stdout 
Filtering with c3!='No results', 
kept 46.58% of 1241 valid lines (1241 total lines).  
Stderr  
Stack Trace 
None 
Info 
None 


>
>> With each new 
analysis, new datasets are produced.  In no case are previous datasets 
overwritten.  With the new analysis in your example, the job again has 
information about the tool / version combination that produced the 
dataset.  So, like I described above, the job can be rerun at some later point. 
 The resulting dataset is not versioned in the way you describe, but 
information is kept about the analysis process that produced the 
resulting dataset.
>[Wanmei] I think you mean this for the example we discussed: Galaxy will keep 
>two separate jobs: Job#1 is the previous analysis with the corresponding 
>tool/version/output dataset; Job#2 is the new analysis with the corresponding 
>tool/version/output dataset. Is my understanding correct?
Yes!




>
>
>
>Thanks,
>Wanmei
>
>
>
>
>________________________________
> From: Greg Von Kuster <g...@bx.psu.edu>
>To: Wanmei <wanmei...@yahoo.com> 
>Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu> 
>Sent: Monday, May 28, 2012 7:13 AM
>Subject: Re: [galaxy-dev] Data file and Analysis Program versioning
> 
>
>Hello Wanmei,
>
>
>On May 27, 2012, at 9:43 PM, Wanmei wrote:
>
>Hi All,
>>
>>
>>I am pretty new to Galaxy. I would like to understand Galaxy's versioning 
>>capability from an end-user perspective (i do not mean the versioning 
>>capability that Mercurial offers in Galaxy repo).
>>
>>
>>
>>I did some research and found the following link mentioned a use case: if an 
>>end-user would like to rerun an analysis which was previously run using a 
>>different version of the analysis program, then Galaxy will prompt the 
>>end-user whether he/she would like to proceed with the new analysis. From 
>>this screenshot (in the link), it looks like Galaxy keeps track of the 
>>metadata of a output data file such as which analysis program and which 
>>version of the code produce it. is my understanding correct?
>>
>>http://wiki.g2.bx.psu.edu/Tool%20Shed#Galaxy_Tool_Versions
>
>You are correct.  In Galaxy, the process of providing an input dataset to an 
>analysis tool creates a Galaxy job.  This job keeps information about the tool 
>that was used, including the version.  The results of the job running is the 
>analysis consisting of one or more additional datasets.  At some later point 
>when a Galaxy user attempts to rerun the job, the original job information is 
>inspected to determine the tool / version combination that was used in the 
>job.  Then the current Galaxy tool box is inspected to see if that tool / 
>version combination is available in the tool box or if a derivative tool / 
>version combination is available, allowing the user to rerun the tool with 
>either the original or the derivative.
>
>
>
>>
>>If the answer to my above question is yes, then i have one more question. 
>>does Galaxy version the output data as well? What i means is, for example, if 
>>the end-user agrees to use a newer version of the code to rerun (answer Yes 
>>to Galaxy's prompt), will the newly generated output be marked as version #2 
>>as oppose to the original output (version #1)? Or it will just simply 
>>overwrites the previous analysis output file?
>
>With each new analysis, new datasets are produced.  In no case are previous 
>datasets overwritten.  With the new analysis in your example, the job again 
>has information about the tool / version combination that produced the 
>dataset.  So, like I described above, the job can be rerun at some later 
>point.  The resulting dataset is not versioned in the way you describe, but 
>information is kept about the analysis process that produced the resulting 
>dataset.
>
>
>Greg Von Kuster
>
>
>
>>
>>
>>
>>Thanks,
>>Wanmei
>>
>>
>>
>>
>>___________________________________________________________
>>Please keep all replies on the list by using "reply all"
>>in your mail client.  To manage your subscriptions to this
>>and other Galaxy lists, please use the interface at:
>>
>> http://lists.bx.psu.edu/
>
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to