Re: [miso-users] Install miso using pip and mixed read length BAMs

Maurits Evers Wed, 07 Jan 2015 01:31:48 -0800

Dear Yarden.

Thanks for the swift response, please see my inline comments below.


On 7/01/2015 3:09 am, Yarden Katz wrote:
> Hi,
>
> See below for comments:
>
> On Jan 5, 2015, at 6:29 AM, Maurits Evers <[email protected]> wrote:
>
>> Dear all.
>>
>> I have been trying to install&run miso on my Mac and have run into a
>> couple of problems/issues. Any help and/or clarifications would be
>> greatly appreciated.
>>
>> 1. I did a global install following the recommended installation method
>> using pip. Everything seems to install fine, and importing misopy and
>> pysplicing from within python works. However, miso, module_availability
>> and test_miso are unknown commands. Chasing the binaries on my machine,
>> I can see that they are located at
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin. Adding
>> this location to PATH fixes the issue of the unknown miso executables.
>> Do I need to add anything else?
> When you install MISO with a package manager like "pip", the executables of 
> the package (binaries like "miso"), get placed at a system-specific binary 
> directory -- whose location is unfortunately not standard -- and in your case 
> happens to be 
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin.  It is 
> sometimes placed in ~/.local/bin.  So that has to be in your PATH for the 
> executables to be accessible.  You only need to do it once and all 
> executables from all Python packages should be available, so no need to do 
> anything else.
>
> A more ideal solution in general is to use pip along with virtualenv, to make 
> a virtual environment that contains all the packages needed for a particular 
> task -- but it's of course not required.
I understand, thanks for the clarification. The documentation recommends 
a global install (rather than a local one using virtualenv), so you 
might want to make a note in the docs if a local installation is preferable.
>
>> 2. As to testing the install, module_availability runs fine. test_miso
>> returns a "Run 0 tests in 0.000s". When I try to execute test_miso from
>> within
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy
>> via python test_miso.py it seems to run the 3 tests mentioned in the
>> documention, but I end up with errors such as the following
>>
>>      .Testing conversion of SAM to BAM...
>>      Executing: sam_to_bam --convert
>>
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-data/sam-data/c2c12.Atp2b1.sam
>>
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output/sam-output
>>      Converting SAM to BAM...
>>      Traceback (most recent call last):
>>         File
>>
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/sam_to_bam",
>>      line 9, in <module>
>>           load_entry_point('misopy==0.5.2', 'console_scripts',
>>      'sam_to_bam')()
>>         File
>>
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py",
>>      line 63, in main
>>           sam_to_bam(sam_filename, output_dir, header_ref=ref)
>>         File
>>
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/sam_to_bam.py",
>>      line 13, in sam_to_bam
>>           os.makedirs(output_dir)
>>         File
>>
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py",
>>      line 150, in makedirs
>>           makedirs(head, mode)
>>         File
>>
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py",
>>      line 157, in makedirs
>>           mkdir(name, mode)
>>      OSError: [Errno 13] Permission denied:
>>
>> '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/misopy/test-output'
>>
>> I don't know why test_miso fails to run properly. Is this something to
>> worry about?
> This is a kink which is partly our fault and partly the terrible and 
> cumbersome way in which Python packages work.  Our test suite needs to create 
> files.  Since you did not use virtualenv (which would create a mini 
> environment in directories where you have write access to), your package got 
> installed by pip in a system wide directory (/opt/local/...).  As a user, it 
> looks like you don't have write access there, so our test suite fails because 
> it needs to create files and it cannot.  I'll workaround this in the next 
> release.
Got it, thanks!
>
>> 3. I have paired-end mouse RNA-seq data which I mapped to the mm10
>> reference genome using tophat. The bam file is sorted and indexed, and I
>> indexed successfully the gff annotation file. Upon running miso with
>>
>>      miso --run indexed ../alignment/tophat/WT.bam --settings-filename
>>      miso_settings.txt --output-dir WT/ --paired-end 472 277 --read-len 120
> Your "--paired-end" parameters look very off -- your insert length 
> distribution most likely does not have a mean of 472 and a standard deviation 
> of 277.  The standard deviation looks far too big, are you sure it's not 
> sqrt(277) = ~17?
As a matter of fact the numbers I provided are the median and median 
absolute deviation rather than the mean and sd. Values were calculated 
using picard tools. Indeed the variance in fragment size length seems 
very large. I checked values using cufflinks, which gives mean = 284 nt, 
sd = 90 nt. This seems more realistic and consistent with the library 
prep protocol.
>> I get the warning that miso found mixed length reads within the BAM
>> file. Prior to mapping, reads were adapter-trimmed and quality-filtered
>> so naturally aligned reads will have a read-length distribution. I don't
>> understand what to make of this warning. I would assume that most
>> RNA-seq data consists of different read lengths, due to some form of
>> trimming/filtering of the raw data. I don't understand why miso would
>> require reads to have the same length in order to be able to estimate
>> isoform expression. Could you advise how to proceed? The read length
>> distribution shows reads with lengths between 20 and 120 nt. Running
>> miso for each of the read lengths separately would be possible but
>> tedious, requiring 100 separate runs followed by merging the individual
>> output files.
> It's unfortunately the case that for now MISO requires the reads to be the 
> same length.  In our experience, trimming the adapters can certainly create 
> variability, but a variation between 20 and 120 is far larger than I've seen, 
> and seems extreme.  In most cases, reads hover around a certain length, such 
> that the minimum length is still basically "as good" as the longest length 
> reads.  E.g. if your reads were between 35-45, you could just trim the reads 
> to 35 -- so you'd have the exact same number of reads (just shorter), and you 
> wouldn't need multiple runs.  But we will adapt MISO to work with multiple 
> read lengths (it requires substantial changes to the code currently.)
>
> What fraction of your reads would you lose if you took reads that are at 
> least 100?  Since the adapter is fixed length, so I'm assuming most of your 
> trimming is caused by poor base quality.  It seems very extreme to have to 
> trim off over 80% of the read, i.e. going from 120 nt to 20 nt, and it 
> shouldn't happen frequently in a high quality RNA-Seq run.
I don't agree with you on this point. Consider the following situation: 
Considerable fragment size distribution (which you get if no additional 
size selection is applied following PCR amplification), a read length of 
120 nt, and paired-end reads. You will sequence (partially) into the 
adapters at the 5'/3' ends, if the gap size is such that length(left 
read)+length(right read)+gap ~= fragment size (You can even have the 
case where the fragment size is smaller than the sum of the read 
lengths). In this case you will end up with a post-trimming read length 
distribution that covers lengths from a minimum (usually 18-20nt) to the 
raw untrimmed read length (in this case 120nt), due to sequencing parts 
(of varying lengths) of the adapter(s). Most high-quality RNA-seq data 
(both small RNA and mRNA protocols) that I have worked with has had a 
similar distribution of read lengths, so it does not seem to be very 
unusual. Of course, you can check using a few sample SRA data sets.
Trimming longer reads down to a fixed length just to get constant read 
lengths sounds like a dubious strategy and should not be done in my opinion.

For now I will stick with cufflinks to estimate isoform expression, as 
cufflinks does not require reads to have a fixed read length. I don't 
understand the reason for this requirement in miso, nor why this would 
be a sensible thing to do for the problem of estimating isoform 
expression strengths, and I think this is a serious limitation to a very 
promising tool. I will keep an eye out for future releases.

Cheers,
Maurits

> Yarden
>
>> Best regards,
>> Maurits
>>
>>
>> -- 
>> Dr. Maurits Evers
>> Center for Integrative Bioinformatics Vienna
>> Max F. Perutz Laboratories
>> Dr. Bohr Gasse 9
>> A-1030 Vienna, Austria
>>
>>
>>
>> -- 
>> Dr. Maurits Evers
>> Statistical Bioinformatics
>> Institute of Functional Genomics
>> University of Regensburg
>> Josef-Engert-Str. 9 (Biopark I)
>> 93053 Regensburg, Germany
>> _______________________________________________
>> miso-users mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/miso-users

_______________________________________________
miso-users mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/miso-users

Re: [miso-users] Install miso using pip and mixed read length BAMs

Reply via email to