Dear Yarden. Thanks for the swift response, please see my inline comments below.
On 7/01/2015 3:09 am, Yarden Katz wrote: > Hi, > > See below for comments: > > On Jan 5, 2015, at 6:29 AM, Maurits Evers <[email protected]> wrote: > >> Dear all. >> >> I have been trying to install&run miso on my Mac and have run into a >> couple of problems/issues. Any help and/or clarifications would be >> greatly appreciated. >> >> 1. I did a global install following the recommended installation method >> using pip. Everything seems to install fine, and importing misopy and >> pysplicing from within python works. However, miso, module_availability >> and test_miso are unknown commands. Chasing the binaries on my machine, >> I can see that they are located at >> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin. Adding >> this location to PATH fixes the issue of the unknown miso executables. >> Do I need to add anything else? > When you install MISO with a package manager like "pip", the executables of > the package (binaries like "miso"), get placed at a system-specific binary > directory -- whose location is unfortunately not standard -- and in your case > happens to be > /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin. It is > sometimes placed in ~/.local/bin. So that has to be in your PATH for the > executables to be accessible. You only need to do it once and all > executables from all Python packages should be available, so no need to do > anything else. > > A more ideal solution in general is to use pip along with virtualenv, to make > a virtual environment that contains all the packages needed for a particular > task -- but it's of course not required. I understand, thanks for the clarification. The documentation recommends a global install (rather than a local one using virtualenv), so you might want to make a note in the docs if a local installation is preferable. > >> 2. As to testing the install, module_availability runs fine. test_miso >> returns a "Run 0 tests in 0.000s". When I try to execute test_miso from >> within >> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy >> via python test_miso.py it seems to run the 3 tests mentioned in the >> documention, but I end up with errors such as the following >> >> .Testing conversion of SAM to BAM... >> Executing: sam_to_bam --convert >> >> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-data/sam-data/c2c12.Atp2b1.sam >> >> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output/sam-output >> Converting SAM to BAM... >> Traceback (most recent call last): >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/sam_to_bam", >> line 9, in <module> >> load_entry_point('misopy==0.5.2', 'console_scripts', >> 'sam_to_bam')() >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py", >> line 63, in main >> sam_to_bam(sam_filename, output_dir, header_ref=ref) >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py", >> line 13, in sam_to_bam >> os.makedirs(output_dir) >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", >> line 150, in makedirs >> makedirs(head, mode) >> File >> >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", >> line 157, in makedirs >> mkdir(name, mode) >> OSError: [Errno 13] Permission denied: >> >> '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output' >> >> I don't know why test_miso fails to run properly. Is this something to >> worry about? > This is a kink which is partly our fault and partly the terrible and > cumbersome way in which Python packages work. Our test suite needs to create > files. Since you did not use virtualenv (which would create a mini > environment in directories where you have write access to), your package got > installed by pip in a system wide directory (/opt/local/...). As a user, it > looks like you don't have write access there, so our test suite fails because > it needs to create files and it cannot. I'll workaround this in the next > release. Got it, thanks! > >> 3. I have paired-end mouse RNA-seq data which I mapped to the mm10 >> reference genome using tophat. The bam file is sorted and indexed, and I >> indexed successfully the gff annotation file. Upon running miso with >> >> miso --run indexed ../alignment/tophat/WT.bam --settings-filename >> miso_settings.txt --output-dir WT/ --paired-end 472 277 --read-len 120 > Your "--paired-end" parameters look very off -- your insert length > distribution most likely does not have a mean of 472 and a standard deviation > of 277. The standard deviation looks far too big, are you sure it's not > sqrt(277) = ~17? As a matter of fact the numbers I provided are the median and median absolute deviation rather than the mean and sd. Values were calculated using picard tools. Indeed the variance in fragment size length seems very large. I checked values using cufflinks, which gives mean = 284 nt, sd = 90 nt. This seems more realistic and consistent with the library prep protocol. >> I get the warning that miso found mixed length reads within the BAM >> file. Prior to mapping, reads were adapter-trimmed and quality-filtered >> so naturally aligned reads will have a read-length distribution. I don't >> understand what to make of this warning. I would assume that most >> RNA-seq data consists of different read lengths, due to some form of >> trimming/filtering of the raw data. I don't understand why miso would >> require reads to have the same length in order to be able to estimate >> isoform expression. Could you advise how to proceed? The read length >> distribution shows reads with lengths between 20 and 120 nt. Running >> miso for each of the read lengths separately would be possible but >> tedious, requiring 100 separate runs followed by merging the individual >> output files. > It's unfortunately the case that for now MISO requires the reads to be the > same length. In our experience, trimming the adapters can certainly create > variability, but a variation between 20 and 120 is far larger than I've seen, > and seems extreme. In most cases, reads hover around a certain length, such > that the minimum length is still basically "as good" as the longest length > reads. E.g. if your reads were between 35-45, you could just trim the reads > to 35 -- so you'd have the exact same number of reads (just shorter), and you > wouldn't need multiple runs. But we will adapt MISO to work with multiple > read lengths (it requires substantial changes to the code currently.) > > What fraction of your reads would you lose if you took reads that are at > least 100? Since the adapter is fixed length, so I'm assuming most of your > trimming is caused by poor base quality. It seems very extreme to have to > trim off over 80% of the read, i.e. going from 120 nt to 20 nt, and it > shouldn't happen frequently in a high quality RNA-Seq run. I don't agree with you on this point. Consider the following situation: Considerable fragment size distribution (which you get if no additional size selection is applied following PCR amplification), a read length of 120 nt, and paired-end reads. You will sequence (partially) into the adapters at the 5'/3' ends, if the gap size is such that length(left read)+length(right read)+gap ~= fragment size (You can even have the case where the fragment size is smaller than the sum of the read lengths). In this case you will end up with a post-trimming read length distribution that covers lengths from a minimum (usually 18-20nt) to the raw untrimmed read length (in this case 120nt), due to sequencing parts (of varying lengths) of the adapter(s). Most high-quality RNA-seq data (both small RNA and mRNA protocols) that I have worked with has had a similar distribution of read lengths, so it does not seem to be very unusual. Of course, you can check using a few sample SRA data sets. Trimming longer reads down to a fixed length just to get constant read lengths sounds like a dubious strategy and should not be done in my opinion. For now I will stick with cufflinks to estimate isoform expression, as cufflinks does not require reads to have a fixed read length. I don't understand the reason for this requirement in miso, nor why this would be a sensible thing to do for the problem of estimating isoform expression strengths, and I think this is a serious limitation to a very promising tool. I will keep an eye out for future releases. Cheers, Maurits > Yarden > >> Best regards, >> Maurits >> >> >> -- >> Dr. Maurits Evers >> Center for Integrative Bioinformatics Vienna >> Max F. Perutz Laboratories >> Dr. Bohr Gasse 9 >> A-1030 Vienna, Austria >> >> >> >> -- >> Dr. Maurits Evers >> Statistical Bioinformatics >> Institute of Functional Genomics >> University of Regensburg >> Josef-Engert-Str. 9 (Biopark I) >> 93053 Regensburg, Germany >> _______________________________________________ >> miso-users mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/miso-users _______________________________________________ miso-users mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/miso-users
