Hi Daniel... On Wed, 2015-07-01 at 23:09 +0200, Daniel Dörr wrote:
> Would you recommend to partition each input sequence into multiple records of > a multi-fasta file so as to omit large masked regions? As I understood from > the manual and from previous posts on this mailing list, progressiveMauve > concatenates all records in multi-fasta files of the input. I guess my > question is: in constructing the backbone, does it prevent homologous > segments to cover more than one record of a multi-fasta file? That's a great question. Currently the backbone entries are not split up by contig, but reported in the concatenate coordinate space. So cutting out the repeat masked regions and breaking contigs would prevent these regions from becoming aligned but would potentially add extra complexity to interpreting the backbone file. > > >> 2) I observe sometimes strange lines in the backbone file such as the > >> following: > >> ___ > >> 7691835 7691966 -85715547 -85715547 0 0 0 0 > >> 349474437 349474583 -700243823 -700243822 0 > >> 0 > >> 8282300 8282275 0 0 0 0 0 0 0 0 > >> 0 0 0 0 > >> ___ > >> > >> Note that in the first line, the segments specified by columns [3,4] and > >> [11, 12] have lengths 0 and -1, respectively. Negative lengths mostly > >> occur for segments that are not homologous to segments in other genomes, > >> as shown in the second line (which makes me wonder why they are included > >> in the backbone file in the first place). > > > > I've not seen this before but yes it does seem like a bug! As a > > workaround, is it possible to ignore these segments in your downstream > > processing until I can get a fix? > > Yes, currently I identify and discard these homologous blocks when processing > the backbone file. I like to note that these “strange lines” occur extremely > rarely in my dataset - only 89 out of 1693288 lines in the backbone file > contain entries of negative segment length. > ok, good to know the extent of the problem is relatively small. -- Aaron E. Darling, Ph.D. Associate Professor, ithree institute University of Technology Sydney Australia http://darlinglab.org twitter: @koadman UTS CRICOS Provider Code: 00099F DISCLAIMER: This email message and any accompanying attachments may contain confidential information. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. Any views expressed in this message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of the University of Technology Sydney. Before opening any attachments, please check them for viruses and defects. Think. Green. Do. Please consider the environment before printing this email. ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Mauve-users mailing list Mauve-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mauve-users