Hi Robert,

I've made the following changes to the seqlevelsStyle() getter and
setter:
  1) No more warning when the getter returns more than 1 compatible
     style.
  2) When more than 1 style is supplied, the setter uses the 1st one
     only with a warning.

That should address the issues you reported below. I'm sure the behavior
of the seqlevelsStyle() setter could be refined when more than 1 style
is supplied but the new behavior should get us going for now.

These changes are in GenomeInfoDb 1.6.3 (release) and 1.7.5 (devel).

Note that the Ensembl mappings were contributed by Jo last week (thanks
Jo) and they're indeed the same as the NCBI mappings for Human but they
differ for all the other organisms supported by GenomeInfoDb at the
moment. Anyway, generally speaking, it sounds like the user should be
able to do seqlevelsStyle(x) <- "Ensembl" independently of whether this
will result in seqlevels that are the same as if s/he had done
seqlevelsStyle(x) <- "NCBI".

Cheers,
H.

On 01/25/2016 02:39 AM, Robert Castelo wrote:
hi,

i would like to ask if current line #142 of
GenomeInfoDb/R/seqlevelsStyle.R:

         message("warning! Multiple seqlevels styles found.")

could be replaced by

         warning("Multiple seqlevels styles found.")

since, after all, the message is a warning.

the reason for my request is that a recent update on GenomeInfoDb added
the 'Ensembl' sequence style and this executes the previous line when i
try to figure out the sequence style of a BAM file produced from human
sequence data and GATK. for instance:

path2bam <- file.path(system.file("extdata",
package="VariantFiltering"), "NA12878.subset.bam")

hdr <- scanBamHeader(path2bam)
names(hdr[[1]]$targets)
  [1] "1"          "2"          "3"          "4"          "5"
  [6] "6"          "7"          "8"          "9"          "10"
[11] "11"         "12"         "13"         "14"         "15"
[16] "16"         "17"         "18"         "19"         "20"
[21] "21"         "22"         "X"          "Y"          "MT"
[26] "GL000207.1" "GL000226.1" "GL000229.1" "GL000231.1" "GL000210.1"
[31] "GL000239.1" "GL000235.1" "GL000201.1" "GL000247.1" "GL000245.1"
[36] "GL000197.1" "GL000203.1" "GL000246.1" "GL000249.1" "GL000196.1"
[41] "GL000248.1" "GL000244.1" "GL000238.1" "GL000202.1" "GL000234.1"
[46] "GL000232.1" "GL000206.1" "GL000240.1" "GL000236.1" "GL000241.1"
[51] "GL000243.1" "GL000242.1" "GL000230.1" "GL000237.1" "GL000233.1"
[56] "GL000204.1" "GL000198.1" "GL000208.1" "GL000191.1" "GL000227.1"
[61] "GL000228.1" "GL000214.1" "GL000221.1" "GL000209.1" "GL000218.1"
[66] "GL000220.1" "GL000213.1" "GL000211.1" "GL000199.1" "GL000217.1"
[71] "GL000216.1" "GL000215.1" "GL000205.1" "GL000219.1" "GL000224.1"
[76] "GL000223.1" "GL000195.1" "GL000212.1" "GL000222.1" "GL000200.1"
[81] "GL000193.1" "GL000194.1" "GL000225.1" "GL000192.1" "NC_007605"
[86] "hs37d5"

seqlevelsStyle(names(hdr[[1]]$targets))
warning! Multiple seqlevels styles found.
[1] "NCBI"    "Ensembl"

this is all fine in interactive mode so that the user is aware that
he/she may have to take action on  this fact. however, at a specific
place of my package VariantFiltering i'd like to take action without
telling anything to the user. for that purpose i would like to use
suppressWarnings() but does not work currently with the message() call.
i could use suppressMessages() but would not prefer to.

on a side note, both NCBI and Ensembl sequence style seem to be identical:

genomeStyles()$Homo_sapiens
    circular  auto   sex NCBI  UCSC dbSNP Ensembl
1     FALSE  TRUE FALSE    1  chr1   ch1       1
2     FALSE  TRUE FALSE    2  chr2   ch2       2
3     FALSE  TRUE FALSE    3  chr3   ch3       3
4     FALSE  TRUE FALSE    4  chr4   ch4       4
5     FALSE  TRUE FALSE    5  chr5   ch5       5
6     FALSE  TRUE FALSE    6  chr6   ch6       6
7     FALSE  TRUE FALSE    7  chr7   ch7       7
8     FALSE  TRUE FALSE    8  chr8   ch8       8
9     FALSE  TRUE FALSE    9  chr9   ch9       9
10    FALSE  TRUE FALSE   10 chr10  ch10      10
11    FALSE  TRUE FALSE   11 chr11  ch11      11
12    FALSE  TRUE FALSE   12 chr12  ch12      12
13    FALSE  TRUE FALSE   13 chr13  ch13      13
14    FALSE  TRUE FALSE   14 chr14  ch14      14
15    FALSE  TRUE FALSE   15 chr15  ch15      15
16    FALSE  TRUE FALSE   16 chr16  ch16      16
17    FALSE  TRUE FALSE   17 chr17  ch17      17
18    FALSE  TRUE FALSE   18 chr18  ch18      18
19    FALSE  TRUE FALSE   19 chr19  ch19      19
20    FALSE  TRUE FALSE   20 chr20  ch20      20
21    FALSE  TRUE FALSE   21 chr21  ch21      21
22    FALSE  TRUE FALSE   22 chr22  ch22      22
23    FALSE FALSE  TRUE    X  chrX   chX       X
24    FALSE FALSE  TRUE    Y  chrY   chY       Y
25     TRUE FALSE FALSE   MT  chrM  chMT      MT

this leads to the previous use case with a BAM file, and also to this
other one:

library(BSgenome.Hsapiens.NCBI.GRCh38)

seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
warning! Multiple seqlevels styles found.
[1] "NCBI"    "Ensembl"

note that now if you have a UCSC-style GRanges object like this one:

gr <- GRanges(c("chr1", "chr2"), IRanges(c(10, 20), c(30, 40)))
seqlevelsStyle(gr)
[1] "UCSC"

that you want to use with the BSgenome object, the following simple
operation will not work anymore:

seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
warning! Multiple seqlevels styles found.
Error in mapSeqlevels(x_seqlevels, value, drop = FALSE) :
   the supplied seqname style must be a single string

of course, this has an easy fix:

seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)[1]
warning! Multiple seqlevels styles found.

but in my view, we loose simplicity in the grammar of moving back and
forth between sequence styles. one workaround to keep the former simpler
syntax would be that 'seqlevelsStyle()<-' takes the first value and
gives a warning instead of prompting an error.

in any case, just out of curiosity, what is the reason to have two
identical sequence styles under different names?


thanks!

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to