Re: [Bioc-sig-seq] Rle vs RangedData

Michael Lawrence Fri, 26 Jun 2009 06:06:11 -0700

On Fri, Jun 26, 2009 at 3:32 AM, Simon Anders <[email protected]> wrote:


> Dear Michael and Patrick
>
> As you may have noticed, my HilbertVis package requires the input data to
> be presented as ordinary vector. Obviously, it would be much better for
> performance to use a run-length-encoded vector (and the stand-alone version
> of HilbertVis already does that).
>
> So, I wanted to add the functionality to use either Rle objects or
> RangedData objects as input to the hilbertDisplay function and got confused
> about the two classes.
>
> Rle seems to be simple and lightweight, but I cannot see how I could
> perform fast random access. If I want to access an element of the vector
> with a given position somewhere in the middle, I suppose I cannot avoid
> having to add up all the lengths in order to find the right value. Is there
> any reason why you store lengths of the constant intervals in the Rle object
> rather than their start points? In the latter case one could achieve random
> access in time O(log n) as opposed to O(n). Or are the start points cached
> somewhere internally?
>
> RangedData does seem to store the data in the start/value scheme that seems
> more advantageous to me. However, it has a rather heavyweight slot
> structure. Do I understand correctly that the canonical way to get the start
> and data vectors from a RangedData object 'rd' would be 'start(rd)' and
> 'rd$score' (or maybe better 'rd[[1]]')?
>
> As the most likely input for hilbertDisplay is the output of the 'coverage'
> function, which is an Rle object, it seems to make sense to change
> hilbertDisplay to accept this. However, for performance reasons, I then
> better convert to RangedData.
>
> Would you agree?
>
> Can you shed some lights about what you intended on when to use "Rle" and
> when "RangedData"?
>

RangedData is for storing genome-wide tracks. It handles multiple variables,
and is internally split across the chromosomes. It also allows gaps and an
arbitrary ordering of features. It's targeted more for a high-level,
multivariate analysis.

An Rle is useful whenever a vector has many repeated values and is consuming
too much memory.

An Rle object, even if it only stores the widths, would be better than
RangedData. Just getting the starts out of a RangedData is an O(n)
operation, and there is in general a lot of overhead for functionality that
is not useful in your case.

Michael


> Thanks.
>
>  Simon
>
>
> +---
> | Dr. Simon Anders, Dipl. Phys.
> | European Bioinformatics Institute (EMBL-EBI)
> | Hinxton, Cambridgeshire, UK
> | office phone +44-1223-492680, mobile phone +44-7505-841692
> | preferred (permanent) e-mail: [email protected]
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Rle vs RangedData

Reply via email to