Re: [Bioc-sig-seq] Reads in 3'utr

Valerie Obenchain Mon, 26 Sep 2011 19:45:43 -0700

On 09/26/11 09:16, rohan bareja wrote:

Yes, you need to merge the counts for each gene.


see ?split

Split the counts on the gene IDs then sum with lapply. Something like,

    lapply(split(df$counts, df$geneID), sum)

Valerie

Hi Valerie,

Thanks a lot..It worked finally.. So now I have a data frame for thegeneIds ,TranscriptIds and the counts (3'utr) which is given below:


  GENE     TX     countsUTRctl
[1,] "148398" "1121" "2"
[2,] "339451" "1118" "0"
[3,]"84069"  "1116" "0"
[4,] "84069"  "1119" "11"
[5,] "9636"   "1126" "11"
[6,] "375790" "1127" "0"

Now I want to do differential expression of genes using DESeq,so do Ihave to merge the two same genes and its counts such as geneID 84069(from above ) or i can proceed with the above dataframe?

If I have to merge them how do I do that?

Thanks,
Rohan

--- On *Sat, 24/9/11, Valerie Obenchain /<voben...@fhcrc.org>/* wrote:


    From: Valerie Obenchain <voben...@fhcrc.org>
    Subject: Re: [Bioc-sig-seq] Reads in 3'utr
    To: "rohan bareja" <rohan_1...@yahoo.co.in>
    Cc: bioc-sig-sequencing@r-project.org
    Date: Saturday, 24 September, 2011, 4:24 AM

    On 09/23/2011 02:57 PM, rohan bareja wrote:

    Hi,

    utr=threeUTRsByTranscript(txdb,use.names=FALSE)
    So,utr is GRangesList of length 33381
    Then as u said,I did the following:

    txBygene <- transcriptsBy(txdb, "gene")
       geneID <- rep(names(txBygene), elementLengths(txBygene))
       df <- data.frame(geneID=geneID,
    txID=values(unlist(txBygene))[["tx_id"]])

     This gives me a dataframe with 40,780 rows with gene ID and txID
    from txBygene object.
              geneID  txID
    40775   9994 11731
    40776   9994 11730
    40777   9997 38491
    40778   9997 38489
    40779   9997 38496
    40780   9997 38497

    Since my utr object is of length 33,381 ,my counts length is same
    i.e 33,381
    So I am not able to map the counts to the above data frame which
    has transcript and gene IDs.


    Yes, these lengths are different.

    In this example we have utr regions from 58 transcripts.

    > length(utr)
    [1] 58


    Those 58 transcripts can be matched to their gene ID's by looking
    at the txBygene object. All of the transcripts fall into one (or
    more) of 51 genes,

    > length(txBygene)
    [1] 51

    There are multiple transcripts per gene so we expand the gene ID's
    to map to the transcripts.

    > dim(df)
    [1] 79  2

    This data.frame has all transcripts from the txdb mapped to the
    gene ID's. Your utr data may contain only a subset of these
    transcripts. That is something you need to check.  Match the
    desired transcript names to the df, pull out the gene IDs. You
    then have the gene ID's for your utr regions and can split or
    group your counts by gene.

    Valerie




    --- On *Fri, 23/9/11, Valerie Obenchain /<voben...@fhcrc.org>
    </mc/compose?to=voben...@fhcrc.org>/*wrote:


        From: Valerie Obenchain <voben...@fhcrc.org>
        </mc/compose?to=voben...@fhcrc.org>
        Subject: Re: [Bioc-sig-seq] Reads in 3'utr
        To: "rohan bareja" <rohan_1...@yahoo.co.in>
        </mc/compose?to=rohan_1...@yahoo.co.in>
        Cc: bioc-sig-sequencing@r-project.org
        </mc/compose?to=bioc-sig-sequencing@r-project.org>
        Date: Friday, 23 September, 2011, 10:50 PM

        Hi Rohan,

        You can relate the counts for 3UTR regions to gene IDs
        through the transcript IDs.

            txdb_file <- system.file("extdata",
        "UCSC_knownGene_sample.sqlite", package="GenomicFeatures")
            txdb <- loadFeatures(txdb_file)
            utr=threeUTRsByTranscript(txdb,use.names=FALSE)


        The transcript names can be matched to the gene ID's through,

            txBygene <- transcriptsBy(txdb, "gene")
            geneID <- rep(names(txBygene), elementLengths(txBygene))
            df <- data.frame(geneID=geneID,
        txID=values(unlist(txBygene))[["tx_id"]])

        Now you know what gene ID each tx count belongs to. You can
        split your counts by gene ID ...


        Valerie



        On 09/20/2011 12:13 PM, rohan bareja wrote:

        Hi everyone,

I am doing NGS analysis using bam files.I have counted reads in 3'utr region usingutr=threeUTRsByTranscript(txdb,use.names=FALSE)

        countsUTR<- countOverlaps(utr,reads)
        I have got the transcript level counts from this.How can I get the gene 
level counts??It might sound silly but Does anybody have an idea on what type 
of anaylses we can do from this countsUTR ?
        Thanks,Rohan
                [[alternative HTML version deleted]]



        _______________________________________________
        Bioc-sig-sequencing mailing list
        Bioc-sig-sequencing@r-project.org
        https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Reads in 3'utr

Reply via email to