Hi Dhanushki,

Thank you for providing the original data for our testing. I worked with the first dataset (labeled "1" by you), the 3.5GB .bam datafile that FTP transferred completely, but when loaded into a history, ended with a of size 2.5Gb.


First I should explain that when BAM data is loaded into a Galaxy history, two things occur:
1 - the file is sorted using Samtools sort
2 - the file is indexed to create the .bam.bai

Next, I can let you know that the 2.5GB file loaded into the history is the complete original dataset. The difference is size is due to the sorting and new Samtools compression. I am not sure what tools you used to create the data,but I was incorrect is stating that the size of a .bam file would be unlikely to decrease so significantly in size and will explain how this was confirmed:

I verified the content two ways:

1 - counted up the number of alignments in the original.bam and history.bam using 'samtools view -c'. Both were the same:

$ samtools view -c original.bam
43232174
$ samtools view -c history.bam
43232174

2 - I directly compared the content. Because the history.bam file was sorted by the process that loaded it into the history, I decided to 'samtools sort' the original.bam file as well, so that I could compare.

$ samtools sort original.bam original.bam.sorted

At this point, the size of original.bam shrank from 3.5GB to 2.5GB. Meaning, it is the sorting by samtools that the reduced the overall size of the file.

But, I wanted to go one step further and actually directly compare the exact contents. So, I used 'samtools view' to extract the alignments, then perform a diff. Diff will report even a single character difference between files.

$ samtools view original.bam.sorted > original.bam.sorted.view
$ samtools view history.bam > history.bam.view
$ diff original.bam.sorted.view history.bam.view > diff.out
$ more diff.out
< nothing, meaning no differences, exactly the same content >

The same can be done on any of your other files, by you locally, in a terminal prompt, if/after you have samtools installed. To download a large Galaxy dataset from a history, do the following command (currently, the command 'wget' is not a fetching option):

1. right click on the disk icon for the dataset and 'copy link location'
2. type into the terminal prompt

$ curl -0 'paste_in_the_copied_link_location'  >  filename.out

Samtools: http://samtools.sourceforge.net
Samtools manual: http://samtools.sourceforge.net/samtools.shtml

Hopefully this helps and you can proceed with your analysis with confidence that your data is intact,

Best,

Jen
Galaxy team

On 4/29/12 8:53 AM, Dhanushki Samaranayake wrote:
Hi,


Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to
Galaxy account, but failed. Your advice was to use FTP upload at
http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the
website and did exactly as it has advised. I used FileZilla ftp client,
uploaded the files to Galaxy account and executed. Now the problem is at
the execution step. For example, my 3.5GB file is accurately uploaded,
but once I execute the file I get is 2.5GB. The file seems to be somehow
truncated. Please advice!


Thanks

Dhanushki



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

--
Jennifer Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to