Hi,

No ordering was defined until now for GAlignments. Starting with
GAlignments 1.11.7, the elements of a GAlignments object 'x' are
ordered based on the order of the elements in 'granges(x)', that
is, by chromosome, then by strand, then by start, then by end.

Note that I chose a simple ordering definition for GAlignments objects
but it has the following important caveat: == does NOT look at the
cigar. That means that 2 elements in a GAlignments object are considered
equal even if their cigars are different:

> gal <- GAlignments(Rle(factor("chr1"), 2), c(5L, 5L), c("10M", "5M2I5M"), Rle(strand("+"), 2))

> gal
GAlignments object with 2 alignments and 0 metadata columns:
      seqnames strand       cigar    qwidth     start       end     width
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
  [1]     chr1      +         10M        10         5        14        10
  [2]     chr1      +      5M2I5M        12         5        14        10
          njunc
      <integer>
  [1]         0
  [2]         0
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> gal[1] == gal
[1] TRUE TRUE

Another ordering definition would be an ordering that looks at
chromosome/strand/start/end/cigar instead of just chromosome/strand/start/end (i.e. the cigar is used to break ties when looking at
chromosome/strand/start/end only leads to a tie). That would address
the above caveat but it feels weird to use lexicographic ordering of
the cigars in order to break ties. So I'm kind of hesitant to do this.

Anyway now <=, <, ==, !=, pcompare(), order(), sort(), rank(), etc...
work.

Still no match(), selfmatch(), duplicated(), or unique() for GAlignments
objects. Note that these things should adhere to the same notion of
equality as == so they would also ignore the cigar right now if we
wanted to have them. Maybe that's another argument in favor of an
ordering based on chromosome/strand/start/end/cigar.

H.


On 01/08/2017 05:00 PM, Dario Strbenac wrote:
Good day,

When sort is used on a GAlignments object, a stack error is shown, no matter 
how small the object is.

testAlignments
GAlignments object with 3 alignments and 0 metadata columns:
                                          seqnames strand       cigar    qwidth 
    start       end     width     njunc
                                             <Rle>  <Rle> <character> <integer> <integer> 
<integer> <integer> <integer>
  700666F:126:C8768ANXX:3:2204:3175:99484    chr14      +      71S27M        98 
 18386040  18386066        27         0
  700666F:126:C8768ANXX:1:1107:8115:31928    chr14      +      40S60M       100 
 18915005  18915064        60         0
  700666F:126:C8768ANXX:1:2206:7564:34686    chr14      +      40S50M        90 
 18915005  18915054        50         0
  -------
  seqinfo: 23 sequences from an unspecified genome

sort(testAlignments)
Error: C stack usage  7970544 is too close to the limit

I use up-to-date packages.

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        
LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              
LC_ADDRESS=C           LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
 [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1           Biostrings_2.42.1    
      XVector_0.14.0
 [5] SummarizedExperiment_1.4.0 Biobase_2.34.0             GenomicRanges_1.26.2 
      GenomeInfoDb_1.10.1
 [9] IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] lattice_0.20-34    bitops_1.0-6       grid_3.3.2         zlibbioc_1.20.0    
Matrix_1.2-7.1     BiocParallel_1.8.1
[7] tools_3.3.2

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to