I checked in a fix for the splitting to CompressedVRangesList. The
slowness of creating a SimpleVRangesList is due to the cost of
extracting a VRanges for each sample. Depending your exact use case, it
might be better to pay that cost up-front, instead of deferring it to
when the user wants to extract an element, which happens with the
compressed list. As long as the number of samples is small, the memory
overhead should be minimal.
Michael
On Wed, Feb 25, 2015 at 9:59 AM, Michael Lawrence <micha...@gene.com
<mailto:micha...@gene.com>> wrote:
Yea, I know, just need to get around to that one. Technically, it
works, but it's obviously not ideal.
On Wed, Feb 25, 2015 at 8:52 AM, Gabe Becker <becker.g...@gene.com
<mailto:becker.g...@gene.com>> wrote:
Why does splitting a VRanges give a GRangesList with VRanges
objects as elements? Seems like it should return a VRangesList.
> spl = split(vr, sampleNames(vr))
> class(spl)
[1] "GRangesList"
attr(,"package")
[1] "GenomicRanges"
> class(spl[[1]])
[1] "VRanges"
attr(,"package")
[1] "VariantAnnotation"
~G
On Wed, Feb 25, 2015 at 8:39 AM, Michael Lawrence
<lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>>
wrote:
Construction will take longer; the savings are in the
accessing of the
elements. But this seems like too much longer, so I will
look into it.
On Wed, Feb 25, 2015 at 8:12 AM, Robert Castelo
<robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>>
wrote:
> my current reason to prefer a CompressedVRangesList
object over a
> SimpleVRangesList object is that i find one order of
magnitude difference
> in creation time in each of these classes of objects:
>
> library(VariantAnnotation)
>
> fl <- system.file("extdata", "CEUtrio.vcf.bgz",
> package="VariantFiltering")
>
> vcf <- readVcf(fl, genome="hg19")
> vr <- as(vcf, "VRanges")
> length(vr)
> [1] 15000
>
> ## create a VRangesList object
> system.time(vrl <- do.call("VRangesList", split(vr,
sampleNames(vr))))
> user system elapsed
> 0.247 0.004 0.252
>
> ## create a CompressedVRangesList object
> system.time(cvrl <- new("CompressedVRangesList", split(vr,
> sampleNames(vr))))
> user system elapsed
> 0.019 0.000 0.019
>
> 0.252/0.019
> [1] 13.26316
>
> with a larger vcf differences increase:
>
> [... load vcf, coerce to VRanges ...]
> length(vr)
> [1] 25916
>
> system.time(vrl <- do.call("VRangesList", split(vr,
sampleNames(vr))))
> user system elapsed
> 2.672 0.000 2.676
>
> system.time(cvrl <- new("CompressedVRangesList", split(vr,
> sampleNames(vr))))
> user system elapsed
> 0.014 0.000 0.014
>
> 2.676 / 0.014
> [1] 191.1429
>
>
> so maybe i'm using the wrong way to construct a
VRangesList object, but
> according to our last conversation about this, there was
no obvious default
> fast way to do it, starting from a VRanges object:
>
>
https://stat.ethz.ch/pipermail/bioc-devel/2015-January/006905.html
>
> it would be great if there's a fast way to do this kind
of construction.
>
> thanks,
>
> robert.
>
> On 02/25/2015 04:42 PM, Michael Lawrence wrote:
>
>> If you're storing data on a relatively small number of
individuals (say,
>> hundreds), you should use SimpleVRangesList, not
CompressedVRangesList.
>>
>> On Wed, Feb 25, 2015 at 7:10 AM, Robert Castelo
<robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>
>> <mailto:robert.cast...@upf.edu
<mailto:robert.cast...@upf.edu>>> wrote:
>>
>> i see you point, the logic i was thinking about is
to use a list of
>> VRanges objects to hold separately the variants of
multiple
>> individuals, with one VRanges object per individual.
>>
>> if i type the name of such a list object on the R
shell, having the
>> GRangesList show method, i feel i do not see much
information
>> because the screen just scrolls up tens or hundreds
of lines
>> specifiying variants per individual. however, the
concise appearance
>> of something like a VRangesList:
>>
>> > vrl
>> VRangesList of length 10
>> names(32): S1 S2 S3 S4 ... S7 S8 S9 S10
>>
>> at least suggests the user that the object holding
the variants has
>> information for 10 samples and belongs to the class
'VRangesList'.
>>
>> i thought this made general sense but i'm fine if
you feel this
>> interpretation does not warrant such a change.
>>
>> cheers,
>>
>> robert.
>>
>> On 02/25/2015 01:25 AM, Michael Lawrence wrote:
>>
>> Why not have the SimpleVRangesList be shown like
>> CompressedVRangesList,
>> for consistency with GRangesList? In other
words, the opposite
>> of what
>> you propose. A strong argument could also be
made that a
>> SimpleGenomicRangesList should be shown like a
GRangesList.
>> Unless there
>> is some aversion to the more verbose output....
>>
>> On Tue, Feb 24, 2015 at 2:36 PM, Robert Castelo
>> <robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>
<mailto:robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>>
>> <mailto:robert.cast...@upf.edu
<mailto:robert.cast...@upf.edu>
>>
>> <mailto:robert.cast...@upf.edu
<mailto:robert.cast...@upf.edu>>__>> wrote:
>>
>> so, yes, but IMO rather than inheriting the
show method from
>> a
>> GRangesList, i think that the show method for
>> CompressedVRangesList
>> objects should be inherited from a
VRangesList object.
>> right now
>> this is the situation:
>>
>> library(VariantAnnotation)
>>
>> example(VRangesList)
>> vrl
>> VRangesList of length 2
>> names(2): sampleA sampleB
>>
>> cvrl <- new("CompressedVRangesList", split(vr,
>> sampleNames(vr)))
>> cvrl
>> CompressedVRangesList object of length 2:
>> $a
>> VRanges object with 1 range and 1 metadata
column:
>> seqnames ranges strand
ref alt
>> totalDepth refDepth altDepth
>> <Rle> <IRanges> <Rle> <character> <characterOrRle>
<integerOrRle>
>> <integerOrRle> <integerOrRle>
>> [1] chr1 [1, 5] + T
>> C 12 5 7
>> sampleNames softFilterMatrix |
tumorSpecific
>> <factorOrRle> <matrix> | <logical>
>> [1] a TRUE |
FALSE
>>
>> $b
>> VRanges object with 1 range and 1 metadata
column:
>> seqnames ranges strand ref alt
totalDepth refDepth
>> altDepth
>> sampleNames softFilterMatrix |
>> [1] chr2 [10, 20] + A T
17 10
>> 6 b FALSE |
>> tumorSpecific
>> [1] TRUE
>>
>> -------
>> seqinfo: 2 sequences from an unspecified
genome; no
>> seqlengths
>>
>> would it be possible to have the
VRangesList show method for
>> CompressedVRangesList objects?
>>
>> robert.
>>
>>
>>
>> On 2/24/15 7:24 PM, Michael Lawrence wrote:
>>
>> I think you might be missing an import.
It should
>> inherit the
>> method for GRangesList.
>>
>> On Tue, Feb 24, 2015 at 9:53 AM, Robert
Castelo
>> <robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>
<mailto:robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>>
>> <mailto:robert.cast...@upf.edu
<mailto:robert.cast...@upf.edu>
>> <mailto:robert.cast...@upf.edu
<mailto:robert.cast...@upf.edu>>__>> wrote:
>>
>> hi,
>>
>> i'm using the CompressedVRangesList
class in
>> VariantFiltering
>> to hold variants and their
annotations across
>> multiple samples
>> and found that there was no show
method for this
>> class (unless
>> i'm missing the right import here)
so i made one
>> within
>> VariantFiltering by copying&pasting
from other
>> similar classes:
>>
>> setMethod("show",
>> signature(object="__CompressedVRangesList"),
>> function(object) {
>> lo <- length(object)
>>
cat(classNameForDisplay(__object), " of
>> length ",
>> lo, "\n",
>> sep = "")
>> if
(!is.null(names(object)))
>> cat(BiocGenerics:::__
>> labeledLine("names",
>> names(object)))
>> })
>>
>> i guess, however, that the right
home for this would
>> be
>> VariantAnnotation. let me know if
you consider
>> adding it there
>> (or somewhere else) and i'll remove
it from
>> VariantFiltering.
>>
>> thanks,
>>
>> robert.
>>
>>
_________________________________________________
>> Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>>
>> <mailto:Bioc-devel@r-project.
<mailto:Bioc-devel@r-project.>__org
>> <mailto:Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>>>
>> mailing list
>> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>>
>>
>>
>>
>> --
>> Robert Castelo, PhD
>> Associate Professor
>> Dept. of Experimental and Health Sciences
>> Universitat Pompeu Fabra (UPF)
>> Barcelona Biomedical Research Park (PRBB)
>> Dr Aiguader 88
>> E-08003 Barcelona, Spain
>> telf: +34.933.160.514 <tel:%2B34.933.160.514>
<tel:%2B34.933.160.514>
>> fax: +34.933.160.550 <tel:%2B34.933.160.550>
<tel:%2B34.933.160.550>
>>
>>
>>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514 <tel:%2B34.933.160.514>
> fax: +34.933.160.550 <tel:%2B34.933.160.550>
>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research