Hi Charles, Vince, Yes, a PairwiseAlignments object will contain the sequences of the 2 genomes being aligned so will be big. Could be mitigated by using one object per chromosome instead of trying to represent the full genome alignment in a single object, but then you loose the ability to represent regions that align across chromosomes.
Other downsides of using PairwiseAlignments are: - You loose the nice/simple block-to-block mapping that GRangePairs gives you, together with the easy/straightforward way to annotate the links between blocks (via the metadata columns of the GRangePairs). - A PairwiseAlignments object can only represent replacements and indels while the block-to-block mapping in a GRangePairs object can support rearrangements (in addition to indels and replacements). - The GRangesPairs approach even allows you to represent a many-to-many relationship between the blocks/regions of the 2 genomes, something that a PairwiseAlignments-based approach cannot do. So the GRangePairs approach seems more flexible. Maybe a better way to support an arbitrary relationship between the blocks/regions of the 2 genomes would be to use a 3-slot data structure: 2 slots for 2 GRanges objects defining regions on the 2 genomes + 1 slot for representing the links between the regions defined on each genome (these links could be stored in a Hits object). Note that this is a classic bipartite graph. Would particularly make sense if the mapping between the regions is expected to be many-to-many. This kind of container would be able to represent a side-by-side comparison of 2 arbitrary genomes, in its more general form, not just a pairwise genome alignment, which is more restrictive. Cheers, H. On 9/18/20 02:41, Vincent Carey wrote: > Starting from > > PairwiseAlignments-class package:Biostrings R Documentation > > PairwiseAlignments, PairwiseAlignmentsSingleSubject, and > PairwiseAlignmentsSingleSubjectSummary objects > > Description: > > The ‘PairwiseAlignments’ class is a container for storing a set of > pairwise alignments. > > The ‘PairwiseAlignmentsSingleSubject’ class is a container for > storing a set of pairwise alignments with a single subject. > > The ‘PairwiseAlignmentsSingleSubjectSummary’ class is a container > for storing the summary of a set of pairwise alignments. > > Usage: > > ## Constructors: > ## When subject is missing, pattern must be of length 2 > ## S4 method for signature 'XString,XString' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1) > ## S4 method for signature 'XStringSet,missing' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1) > ## S4 method for signature 'character,character' > PairwiseAlignments(pattern, subject, > type = "global", substitutionMatrix = NULL, gapOpening = 0, > gapExtension = 1, > baseClass = "BString") > > ... > > my question would be whether this is a relevant starting place? Clearly > the focus is not on coordinates, but perhaps a structure that maintains > genomic content and coordinates together would be of use? > > > On Fri, Sep 18, 2020 at 2:49 AM Charles Plessy <charles.ple...@oist.jp> > wrote: > >> Dear Bioc developers, >> >> I am currently analysing pairwise genome alignments with Bioconductor, >> and I represent them with a GRanges object of the first genome, >> containing one element by alignment block, and storing the coordinates >> in the other genome in a metadata column containing another GRanges object. >> >> Something like this. >> >> GRanges object with 36582 ranges and 2 metadata columns: >> seqnames ranges strand | score query >> <Rle> <IRanges> <Rle> | <numeric> <GRanges> >> [1] S1 162-550 + | 861 XSR:909374-909853 >> [2] S1 833-3738 + | 7238 XSR:910181-913291 >> [3] S1 3769-4212 + | 1165 XSR:913510-913953 >> [4] S1 4246-4381 + | 359 XSR:914134-914275 >> [5] S1 4532-5990 + | 2977 chr2:6694031-6695569 >> ... ... ... ... . ... ... >> [36578] S99 17228-17759 - | 793 chr1:2375870-2376379 >> [36579] S99 16417-16935 - | 632 chr1:2376612-2377077 >> [36580] S99 12370-12759 - | 773 chr1:2379949-2380343 >> [36581] S99 5270-5384 - | 295 chr1:843397-843511 >> [36582] S99 1949-3053 - | 2105 chr1:845358-846326 >> ------- >> >> Using "Pairwise genome alignment" as a keyword in a search engine, I >> found that the packages CNEr is doing something similar, although it >> uses a dedicated "GRangePairs" object for the purpose. >> >> Before I start to invest time in either direction, I wanted to check on >> that mailing list if there were other solutions already existing, in >> particularly closer to the core packages ? >> >> Have a nice day, >> >> Charles >> >> -- >> Charles Plessy - - ~ ~ ~ ~ ~ ~~~~ ~ ~ ~ ~ ~ - - charles.ple...@oist.jp >> Okinawa Institute of Science and Technology Graduate University >> Staff scientist in the Luscombe Unit - ~ - >> https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.oist.jp_grsu&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r5xETEWy-EvPFytBzN3OIep0rJszcSjeifYojLhhtaA&s=oEIrW494OIg6MI6BH6Ejfv96KG24jJ5H3Ijrc0LuFro&e= >> Toots from work - ~ ~~ ~ - >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mastodon.technology_-40charles-5Fplessy&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r5xETEWy-EvPFytBzN3OIep0rJszcSjeifYojLhhtaA&s=7x6nE_0XPtO8DIDREGFWyCk5HhTa000nsvUSR_fcNlc&e= >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=r5xETEWy-EvPFytBzN3OIep0rJszcSjeifYojLhhtaA&s=r_OCYlJwGnKasJbsl9ly6L9Ini_26uXFqKK80ZTgKo4&e= >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel