Re: [Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values

Hervé Pagès Wed, 15 Jun 2011 16:28:35 -0700

On 11-06-15 03:38 PM, Michael Lawrence wrote:



2011/6/15 Hervé Pagès <[email protected] <mailto:[email protected]>>

    Hi Michael, Janet,

    I just added an "as.vector" method for XStringSet objects to
    Biostrings 2.21.6:

     > library(Biostrings)
     > x <- DNAStringSet(c("aaatg", "gt"))
     > as.vector(x)
      [1] "AAATG" "GT"

    But that doesn't solve Janet's problem:

     > df <- DataFrame(id=c("ID1", "ID2"), seqs=x)
     > df
      DataFrame with 2 rows and 2 columns
                 id           seqs
    <character> <DNAStringSet>
      1         ID1          AAATG
      2         ID2             GT
     > as.data.frame(df)

      Error in as.data.frame.default(y, optional = TRUE, ...) :
        cannot coerce class 'structure("DNAStringSet", package =
    "Biostrings")' into a data.frame

    Michael?


Well, sorry for that. I just added a coercion from Vector to data.frame
through as.vector, so this works.


Thanks!

But someone might add a coercion from
List to data.frame that would treat the elements as columns. Would this
make sense?


Hard to tell. Maybe sometimes it would make sense, but sometimes it
definitely does not (e.g. DNAStringSet).

AtomicList to data.frame does something even stranger: it
creates a two column data frame with the unlisted values and
names/indices rep'd out as a factor. Actually, that's kind of cool,
since usually one does not have a list with equal element lengths, but
it's somewhat unintuitive. But why does it apply only to AtomicList?


Glad you bring this on the table.

For the record, "as.vector" also unrolls an AtomicList:

  > as.vector(IntegerList(1:4, 0:-2))
  [1]  1  2  3  4  0 -1 -2

IMO, we should not do things like that. Because:

  1) The same can be achieved with unlist():

    > unlist(IntegerList(1:4, 0:-2))
    [1]  1  2  3  4  0 -1 -2

  2) It's totally unintuitive to use as.vector for unlisting
     a list (as.vector on a standard list does not do that).

  3) There is a strong expectation that as.vector() will preserve
     the length of its input.

So I propose to deprecate those "as.vector" and "as.data.frame"
methods for AtomicList objects.

H.

Anyway, given the special correspondence between a XStringSet and a
character vector, we could always add an as.data.frame method for
XStringSet, just to make sure stuff behaves as expected.

    Thanks,
    H.


     > sessionInfo()
    R version 2.14.0 Under development (unstable) (2011-05-30 r56024)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
      [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
      [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
      [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
      [7] LC_PAPER=C                 LC_NAME=C
      [9] LC_ADDRESS=C               LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C


    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base

    other attached packages:
    [1] Biostrings_2.21.6 IRanges_1.11.10



    On 11-06-15 12:49 PM, Janet Young wrote:

        yes - as.character seems a good choice, I think

        thanks,

        Janet

        On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote:

            So you would expect that the DNAStringSet is converted to a
            character vector? DNAStringSet (technically XStringSet) then
            just needs an as.vector method that delegates to as.character.

            Michael


            On Wed, Jun 15, 2011 at 12:37 PM, Janet
            Young<[email protected] <mailto:[email protected]>>  wrote:
            Hi there,

            I'm trying to as as.data.frame on a GRanges object. On
            regular GRanges objects it works fine but I have some
            objects that contain a DNAStringSet in the values column,
            which isn't built in to the as.data.frame method.  Is it
            possible to add the ability to coerce the DNAStringSet too,
            please?

            Here's some code that demonstrates the issue:

            ################
            library(GenomicRanges)
            library(Biostrings)

            gr1<-
            
GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c(1,101,201),width=50),strand=c("+","-","+"),
            genenames=c("seq1","seq2","seq3") )

            as.data.frame(gr1)
            # works

            gr2<- gr1
            values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG",
            "ACGGTGGTGTT", "GAGGCTG"))

            as.data.frame(gr2)
            # Error in as.data.frame.default(y, optional = TRUE, ...) :
            #   cannot coerce class 'structure("DNAStringSet", package =
            "Biostrings")' into a data.frame
            ################

            and here's   sessionInfo() output:

            R version 2.13.0 (2011-04-13)
            Platform: i386-apple-darwin9.8.0/i386 (32-bit)

            locale:
            [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

            attached base packages:
            [1] stats     graphics  grDevices utils     datasets
              methods   base

            other attached packages:
            [1] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4

            ################


            You might wonder why I'm storing sequences in the GRanges
            values - in my real data they're sequencing reads that have
            mapped back to that region, but I'm still curious to
            maintain the sequence itself (for the moment) because it's
            not always identical to the underlying genomic sequence of
            that region (investigating mapping issues).

            (and my desire to use as.data.frame relates to a suggestion
            from Herve to let me workaround some issues with the
            identical function)

            thanks,

            Janet

            _______________________________________________
            Bioc-sig-sequencing mailing list
            [email protected]
            <mailto:[email protected]>
            https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


        _______________________________________________
        Bioc-sig-sequencing mailing list
        [email protected]
        <mailto:[email protected]>
        https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: [email protected] <mailto:[email protected]>
    Phone:  (206) 667-5791
    Fax:    (206) 667-1319



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values

Reply via email to