Hello Peter, Breaking this issue into the following 2 parts, here is the status.
1. Don't alter the contents of files being uploaded to a data library if using the "upload_directory" or "upload_paths" options in conjunction with the "Link to files without copying into Galaxy" option. This issue has been resolved in change set 5221:b5ecb8f4839d. 2. Determine if a BAM file is sorted before it is introduced into the Galaxy environment so that it will only be sorted if necessary. We have a very simple test for this in the Bam class's _is_coordinate_sorted(0 method in ~/lib/galaxy/datatypes/binary.py, but this method obviously needs improvements. The improved implementation is a bit non-trivial, but it is high priority, so should be completed soon. In the meantime, Bam files cannot be uploaded to a data library using the combinations of options described in 1 above if they do not pass the current simple, rigid test in the Bam class's method. Thanks for your message, Greg Von Kuster On Mar 10, 2011, at 1:18 PM, Peter Cock wrote: > Hi all, > > I think I have fallen over the same problem Glen Beane reported in Nov 2010, > http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003943.html > > I recall from reading the mailing list, that when you import a BAM file > into Galaxy, it gets sorted and indexed. That makes sense since most > tools need that to be done, and resorting an already sorted file should > be quick. > > However, I'm trying to import some presorted and indexed BAM files > into a library in Galaxy via the Admin settings, linking to the file not > copying it. I'm getting this: > > <quote> > > File size: 4.4 Gb > Data type: auto > Build: ? > Miscellaneous information: uploaded bam fileTraceback (most recent > call last): File "/opt/galaxy-dist/tools/data_source/upload.py", line > 447, in __main__() File > "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__ > add_file( dataset, registry, j > Job Standard Output > > [bam_sort_core] merging from 28 files... > > Job Standard Error > > Traceback (most recent call last): > File "/opt/galaxy-dist/tools/data_source/upload.py", line 447, in > __main__() > File "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__ > add_file( dataset, registry, json_file, output_path ) > File "/opt/galaxy-dist/tools/data_source/upload.py", line 381, in add_file > datatype.groom_dataset_content( output_path ) > File "/opt/galaxy-dist/lib/galaxy/datatypes/binary.py", line 98, in > groom_dataset_content > shutil.move( samtools_created_sorted_file_name, file_name ) > File "/usr/local/lib/python2.6/shutil.py", line 260, in move > copy2(src, real_dst) > File "/usr/local/lib/python2.6/shutil.py", line 95, in copy2 > copyfile(src, dst) > File "/usr/local/lib/python2.6/shutil.py", line 51, in copyfile > with open(dst, 'wb') as fdst: > IOError: [Errno 13] Permission denied: '/data/XXX-bwa-out.sorted.bam' > > error > Database/Build: ? > Number of data lines: None > Disk file: /data/XXX-bwa-out.sorted.bam > > </quote> > > Clearly from the error message, Galaxy is trying to edit the source > file (and the Unix account it is running in only has read permission > for this file and its containing folder). From the stdout message > "[bam_sort_core] merging from 28 files..." it looks like Galaxy is > trying to (re)sort my file, and may well attempt to reindex it. Is that > likely to be the case? > > If the "copy" option had been used, then sorting and indexing > should work - but I want Galaxy to link to the file as it it. > > If however "copy" was not selected, then I don't want Galaxy > trying to alter the file like this. Could the sort+index be disabled > in this mode? I think it is reasonable to expect administrators > trying to import BAM files from the local file system to take > care of this. > > Alternatively, you could actually check if the BAM file is pre-sorted > or not. > > Peter > ___________________________________________________________ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ Greg Von Kuster Galaxy Development Team g...@bx.psu.edu ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/