On 02/18/2016 01:00 AM, Maintainer wrote:
Hello,

Hi,

I'm putting this back to bioc-devel and setting the Subject to
"Strand-Awareness for Restrict Function" since this seems to be
a continuation of this thread (that you started here). I hope you
don't mind.


Setdiff automatically reduces the resulting ranges. Is it intentional ?

Absolutely. The "vector-wise set operations" (i.e. union, intersect,
setdiff) treat 'x' and 'y' as 2 sets of genomic positions (or sets of
integers when 'x' and 'y' are Ranges objects) so duplicated positions
are removed.

I can't see any formal definition of what setdiff is supposed to do, so I'm not 
sure. For example,

setdiff(IRanges(1:10, width =1), IRanges(2, 4))

reduces all of the adjacent ranges in the first variable. This is not 
documented.

The man page for set operations on Ranges and RangesList objects says:

  The 'union', 'intersect' and 'setdiff' methods for Ranges objects
  return a "normal" Ranges object representing the union, intersection
  and (asymmetric!) difference of the sets of integers represented by
  'x' and 'y'.

Admittedly this man page was too hard to find (you had to do
?`setdiff,Ranges,Ranges-method`, and ?`setdiff,GRanges,GRanges-method`
to get the man page for the methods for GRanges objects) and the above
explanation could probably be clearer. I've tried to remedy this (in
IRanges 2.5.32 and GenomicRanges 1.23.20) by adding aliases to these
man pages so they can be found with just ?setdiff (or ?union, or
?intersect).


Also, it would be good if it had a strict.strand option

All the setops methods for genomic ranges have an 'ignore.strand'
option. Is that what you are asking for or do you have something
else in mind with 'strict.strand'?

and an explanation of which variables the ... option accepts.
Currently, it does not have any effect and the documentation of ... is 
ambiguous.

humanGenome
GRanges object with 23 ranges and 0 metadata columns:
        seqnames         ranges strand
           <Rle>      <IRanges>  <Rle>
    [1]     chr1 [1, 249250621]      *
    [2]     chr2 [1, 243199373]      *
    [3]     chr3 [1, 198022430]      *
    [4]     chr4 [1, 191154276]      *
    [5]     chr5 [1, 180915260]      *
humanGenes
GRanges object with 22980 ranges and 1 metadata column:
         seqnames                 ranges strand |     gene_id
            <Rle>              <IRanges>  <Rle> | <character>
       1    chr19 [ 58858172,  58874214]      - |           1
      10     chr8 [ 18248755,  18258723]      + |          10
     100    chr20 [ 43248163,  43280376]      - |         100
    1000    chr18 [ 25530930,  25757445]      - |        1000
   10000     chr1 [243651535, 244006886]      - |       10000
setdiff(humanGenome, humanGenes, strict.strand = FALSE)  # Has no effect 
because of transcripts' strands.
GRanges object with 23 ranges and 0 metadata columns:
        seqnames         ranges strand
           <Rle>      <IRanges>  <Rle>
    [1]     chr1 [1, 249250621]      *
    [2]     chr2 [1, 243199373]      *
    [3]     chr3 [1, 198022430]      *
    [4]     chr4 [1, 191154276]      *
    [5]     chr5 [1, 180915260]      *

--------------------------------------

The ellipsis was actually not needed (and was not used) in the
setops methods for genomic ranges so I removed it in GenomicRanges
1.23.20.

Back to your original question: setdiff() is not the right tool for
the strand-aware trimming you're trying to achieve but it occurs to
me that psetdiff() (element-wise setdiff) might be a suitable
alternative (although you have some work to do to prepare an 'y'
that is parallel to 'x' and contains the regions to trim). I don't
think it's going to be easier/simpler than the restrict-based solution
I gave you earlier though.

H.

Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia


________________________________________________________________________
devteam-bioc mailing list
To unsubscribe from this mailing list send a blank email to
devteam-bioc-le...@lists.fhcrc.org
You can also unsubscribe or change your personal options at
https://lists.fhcrc.org/mailman/listinfo/devteam-bioc


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to