Re: [Bioc-sig-seq] drop factor level for seqnames of a GRanges object

Hervé Pagès Wed, 06 Oct 2010 17:15:31 -0700

Hi Michael, Chris,

Yes it seems that at the moment seqlengths<-(), the replacement form
of seqlengths(), doesn't let me remove levels, only add new ones:


> gr
GRanges with 2 ranges and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
  A  B  C
 NA NA NA

> seqlengths(gr) <- seqlengths(gr)[-3]
> gr
GRanges with 2 ranges and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
  A  B  C
 NA NA NA

> seqlengths(gr) <- c(seqlengths(gr), D=40L)
> gr
GRanges with 2 ranges and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle> <IRanges>  <Rle> |
[1]        A   [5, 11]      * |
[2]        B   [6, 12]      * |

seqlengths
  A  B  C  D
 NA NA NA 40

I agree with Michael that it would be nice to be able to do

  seqlengths(gr) <- seqlengths(gr)[-3]

and have level C actually dropped. And of course

  seqlengths(gr) <- seqlengths(gr)[-1]

should fail (one should not be allowed to drop levels that are used).

Cheers,
H.


On 10/06/2010 02:30 PM, Michael Lawrence wrote:

At the moment, this is extremely difficult and requires obscure hacks to
achieve. It's been brought up numerous times.

It happens all the time when comparing gene annotations to read alignments.
The gene annotations from GenomicFeatures include the entire set
chromosomes, which is nice, but the sequence reads are usually loaded in a
genome-independent way and only have the chromosomes present in the data. It
would be best to add the additional chromosomes to the read dataset, even if
there are no mappings, rather than throw away intervals from the other set.

The main problem is that the seqlengths and seqnames need to updated
simultaneously.  Really one should take precedence over the other, and I
recently changed the GRanges constructor to favor seqlengths, so that the
levels of seqnames are set to the names of seqlengths. Thus, it would be
nice if one could call seqlengths<- and have it add the necessary levels to
seqnames automatically.  Then you can just do:

seqlengths(reads)<- seqlengths(genes)

and be done with it. Of course, if seqnames contains values that are not
present in the new seqlengths, it should fail.

Comments? Should I make it work this way?

Michael

On Wed, Oct 6, 2010 at 2:10 PM, Chris Seidel<[email protected]>  wrote:

Hello,

How do I remove a factor level for the sequence names of a GRanges object?

Sometimes I work with data sets that have not been aligned to all the
same chromosomes. To make them comparable I have to remove or ignore
reads for the odd chromosome. So if I have a GRanges object for which I
want to remove all reads matching a given chromosome, e.g.:

gr<- gr[seqnames(gr) != "chrXHet"]

levels(seqnames(gr)) will return all original seqnames, but I would like
to exclude "chrXHet" since I've removed all the reads matching that
chromosome. How do I do this?

-Chris

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] drop factor level for seqnames of a GRanges object

Reply via email to