On Wed, Nov 3, 2010 at 11:14 PM, Kunbin Qu <[email protected]> wrote:

> Dear all,
>
> I have a mapping file from Bowtie from a RNA-seq run against human genome.
> I created RangedDataList to represent the mapping coordinates from different
> chromosomes, strands and lanes. Now I would like to eliminate the RangedData
> entries which have the same IRanges start and end, chromosome number and
> strand orientation.
>
> In the following example, entry 1, 3 and 4 have the same chromosome,
> strand, start and end, and after the procedure, they should be reduced to
> one entry. Is there a function I can use? Or is there some other better ways
> to represent the mapping info which include chromosome, strand, star t and
> end, rather than RangedData? Thanks.
>
>

I would say GRanges is better, but it does not have a unique() function. Are
you aware that ShortRead performs this sort of filtering on AlignedRead,
even during input? See help(occurrenceFilter).


> -Kunbin
>
>
> > head(sLane[["s_1"]][3])
> RangedData with 6 rows and 2 value columns across 1 space
>        space                 ranges |   strand     index
>  <character>              <IRanges> | <factor> <integer>
> 1        chr1 [223780005, 223780055] |        +         6
> 2        chr1 [ 89018675,  89018725] |        -        55
> 3        chr1 [223780005, 223780055] |        +        68
> 4        chr1 [223780005, 223780055] |        +        69
> 5        chr1 [107921032, 107921082] |        -        75
> 6        chr1 [243086472, 243086522] |        -        86
> > class(sLane[["s_1"]][3])
> [1] "RangedData"
> attr(,"package")
> [1] "IRanges"
> > class(sLane[["s_1"]])
> [1] "RangedData"
> attr(,"package")
> [1] "IRanges"
> > class(sLane)
> [1] "RangedDataList"
> attr(,"package")
> [1] "IRanges"
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ShortRead_1.6.2     Rsamtools_1.0.1     lattice_0.19-11
> [4] Biostrings_2.16.7   GenomicRanges_1.0.1 IRanges_1.6.8
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0 grid_2.11.0   hwriter_1.2   tools_2.11.0
> >
>
>
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are
> intended only for the use of the individual or entity to which they are
> addressed and may contain confidential information. If you are not the
> intended recipient, you are hereby notified that any use, dissemination,
> distribution, or copying of this message or any attachment is strictly
> prohibited. If you have received this transmission in error, please send an
> e-mail to [email protected] and delete this message, along with
> any attachments, from your computer.
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to