Re: [Bioc-sig-seq] Comparing two chipseq position sets

Ivan Gregoretti Wed, 20 May 2009 14:02:45 -0700

Hello Nicolas and everybody,

Have you managed to get the output of the peak finder MACS into a
genomeIntervals object?



The output of MACS as seen through a data.frame object looks like this

> head(A, 10)
    chr   start     end length summit tags Tpvalue fold_enrichment  FDR
1  chr1 3660782 3662707   1926    749   64  492.04           38.44 0.11
2  chr1 4481743 4482656    914    179   14   83.64           10.01 0.88
3  chr1 4482814 4484003   1190    671   18  108.48           16.43 0.17
4  chr1 4561321 4562262    942    472   13  136.26           30.96 0.04
5  chr1 4774888 4776304   1417    781   60  674.41           78.07 0.15
6  chr1 4797292 4798822   1531    813   56  382.46           42.73 0.06
7  chr1 4847808 4848846   1039    533   26  325.36           82.17 0.05
8  chr1 5008094 5009386   1293    973   20  117.98           13.28 0.08
9  chr1 5009515 5010046    532    179   10   94.86           24.65 0.38
10 chr1 5010096 5010583    488    170    6   52.31           20.54 5.39


My attempt to make it a genomeIntervals object goes as far as this:

giA <- new("Genome_intervals", A[ ,2:3], closed=c(TRUE,TRUE),
annotation=data.frame(A[ ,1], inter_base=FALSE, strand="+", A[ ,4:9]))

The operation reports no errors but it doesn't really work because I can't
use it. For instance, I can't make a simple call to the first three elements
like this

> giA[1:3]
Error in .nextMethod(x, i, j, ..., drop) : subscript out of bounds

Any insights?

Thank you,

Ivan



Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878


On Thu, May 7, 2009 at 11:29 AM, Nicolas Delhomme <[email protected]> wrote:

> Hi Ivan,
>
> I'm not yet so familiar with iRanges, just starting to use it. By the way,
> I use the opportunity to thank the guys behing those libraries (ShortRead,
> chipseq, iRanges, etc.) as they are doing a tremendous work. Chapeau bas
> messieurs!
> Back to your question, the fact that A and B have different size is not a
> problem for genomeIntervals. The interval_overlap function will return a
> list, which for every interval of A will have a vector of the position in B
> of the corresponding overlapping interval.
> I agree, this is confusing, so here is the bit of code applied to for your
> example:
>
> require(genomeIntervals)
>
> A.set<-new("Genome_intervals",
>                matrix(
>                        c(3660781,3662707,
>                        4481742,4482656,
>                        4482813,4484003,
>                        4561320,4562262,
>                        4774887,4776304,
>                        4797291,4798822,
>                        4847807,4848846,
>                        5008093,5009386,
>                        5009514,5010046,
>                        5010095,5010583),ncol=2,byrow=TRUE),
>                closed=c(TRUE,TRUE),
>                annotation=data.frame(
>                seq_name=rep("chr1",10),
>                inter_base=FALSE,
>                strand="+"
>                )
>           )
>
> B.set<-new("Genome_intervals",
>                matrix(
>                        c(
>                         3659579,3662079,
>                          4773791,4776291,
>                           4797473,4799973,
>                            4847394,4849894,
>                             5007460,5009960,
>                              5072753,5075253,
>                               6204242,6206742,
>                                7078730,7081230,
>                                 9282452,9284952,
>                                  9683423,9685923),ncol=2,byrow=TRUE),
>                closed=c(TRUE,TRUE),
>                annotation=data.frame(
>                seq_name=rep("chr1",10),
>                inter_base=FALSE,
>                strand="+"
>                )
>           )
>
> A.B.overlap<-interval_overlap(A.set,B.set)
>
> A.B.overlap
>
> [[1]]
> [1] 1
>
> [[2]]
> integer(0)
>
> [[3]]
> integer(0)
>
> [[4]]
> integer(0)
>
> [[5]]
> [1] 2
>
> [[6]]
> [1] 3
>
> [[7]]
> [1] 4
>
> [[8]]
> [1] 5
>
> [[9]]
> [1] 5
>
> [[10]]
> integer(0)
>
> As you can see the first interval in A overlap with the first in B; the
> fifth in A with the second in B and so on...
>
> The comment from Steve are very good, I've read about it quite often in the
> literature, especially for ChIP-chip but I don't know if there's a package
> around. Let me know if you find/know about one.
>
> Best,
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: [email protected]
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
> On 7 May 2009, at 16:02, Ivan Gregoretti wrote:
>
>  Hello Steve, Nicolas and Michael,
>>
>> I agree with all of you: it is not a trivial question.
>>
>> I asked the bioc-sig-seq listers because I thought, --Hey, this must
>> be the everyday's question of the genome analyst.
>>
>> Say you ran your chipseq under condition A and then you ran it under
>> condition B. Then you have to decide whether A and B made any
>> difference. It doesn't get any simpler than that!
>>
>> I can't compare the two means or the two dispersions. I have to
>> compare pairs. The problem is that it is not trivial to unambiguously
>> determine which spot in B must be paired with each spot in A. To start
>> with, A and B may have different numbers of loci (ie 15000 versus
>> 18000).
>>
>> I'll take a look at genomeIntervals and IRanges.
>>
>> By the way, Michael, would you let me know as soon as the new IRanges
>> documentation comes out? You guys were working on something, I
>> understand.
>>
>> Thank you all,
>>
>> Ivan
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>>
>>
>> On Thu, May 7, 2009 at 9:24 AM, Michael Lawrence <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Wed, May 6, 2009 at 12:40 PM, Ivan Gregoretti <[email protected]>
>>> wrote:
>>>
>>>>
>>>> Hello Bioc-sig-seq,
>>>>
>>>> Say you run your ChIP-seq and find binding positions like this
>>>>
>>>> chr1  3660781  3662707
>>>> chr1  4481742  4482656
>>>> chr1  4482813  4484003
>>>> chr1  4561320  4562262
>>>> chr1  4774887  4776304
>>>> chr1  4797291  4798822
>>>> chr1     4847807  4848846
>>>> chr1  5008093  5009386
>>>> chr1  5009514  5010046
>>>> chr1  5010095  5010583
>>>> ...[many more loci and chromosomes]...
>>>>
>>>> Then you want to compare it to published data like this
>>>>
>>>> chr1  3659579  3662079
>>>> chr1  4773791  4776291
>>>> chr1  4797473  4799973
>>>> chr1  4847394  4849894
>>>> chr1  5007460  5009960
>>>> chr1  5072753  5075253
>>>> chr1  6204242  6206742
>>>> chr1  7078730  7081230
>>>> chr1  9282452  9284952
>>>> chr1  9683423  9685923
>>>> ...[many more loci and chromosomes]...
>>>>
>>>> What method would you use to test whether these two lists are
>>>> significantly different?
>>>>
>>>
>>> This is a tough statistical question that probably needs to be a bit more
>>> specific, but as far as technical tools, in addition to genomeIntervals
>>> there is the IRanges package and its efficient "overlap" function.
>>> IRanges
>>> is well integrated with the rest of sequence analysis infrastructure in
>>> Bioconductor.
>>>
>>>
>>>> Any pointer would be appreciated.
>>>>
>>>> Ivan
>>>>
>>>> Ivan Gregoretti, PhD
>>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>>> National Institutes of Health
>>>> 5 Memorial Dr, Building 5, Room 205.
>>>> Bethesda, MD 20892. USA.
>>>> Phone: 1-301-496-1592
>>>> Fax: 1-301-496-9878
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> [email protected]
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Comparing two chipseq position sets

Reply via email to