Hi Evan, The BED12 format has no concept of "0 exons". If an item has a start and an end, then one or more blocks must span the start and end. If there is no coding/thick region, then thickStart = thickEnd. Also, "=" is not an expected strand character -- try +, -, or . if unknown.
This line of BED will result in a thin span (no thick area because thickStart=thickEnd): chr1 72360320 72469772 March4 0 . 72360320 72360320 0 1 109452, 0, BlockStarts are relative to chromStart, so you'll need to subtract chromStart from each blockStarts value below. The first blockStarts value is always 0. For March4 in particular -- I typed "March4" in the position/search box of the mm9 (July 2007) genome browser, and there is a match in the UCSC genes track at chr1:72,473,686-72,583,138 that includes exons and coding region. If you have 42000 identifiers that should match the UCSC Genes track items then you may be able to use the Table Browser to retrieve mm9 coordinates for those (at least as a point of comparison or source of missing exon/CDS coords). If you would like to try that or have any other questions, please let us know at [email protected]. Hope that helps, Angie > Date: Wed, 1 Jul 2009 12:46:29 -0500 > From: evan williams <[email protected]> > To: [email protected] > Subject: [Genome] LiftOver BED Format Catch-22 Error > > Hi, > > I'm using the LiftOver utility to update a list of gene and exon > locations from mm8 to mm9, using the 'mm8ToMm9.over.chain' file. My > problem is with the BED format and liftOver. I have a text file with > 42000 rows that looks like so: > > > chr8 68547513 69399414 March1 0 = 68896145 69397519 0 8 > 239,77,209,51,80,181,148,2194, > 68547513,68808318,68896035,69205650,69315428,69347804,69385041,69397220 > chr17 33292398 33325385 March2 0 = 33294782 33316635 0 6 > 2531,135,210,196,228,125, > 33292398,33298704,33302721,33309785,33316459,33325260 > chr18 56887084 57050917 March3 0 = 56887883 56937388 0 5 853,210,205,244,242, > 56887084,56908387,56933032,56937200,57050675 > chr1 72360320 72469772 March4 0 = 0 0 0 0 0, 0, > chr19 37272964 37287281 March5 0 = 37273013 37286650 0 6 > 84,203,131,184,167,748, 37272964,37275814,37282326,37285446,37285782,37286533, > > Using all 12 of the parameters allowed in the BED file format (it's > ordered properly; http://genome.ucsc.edu/FAQ/FAQformat#format1). The > problem lies in the blockCounts. If I have zero known exons, as is > the case in the gene March4, the program throws me a Catch-22. I am > required to put "0" in the blockCount when there are no exons; the > program breaks if I leave it as null. However, if I put "0 0, 0," > the program throws (as expected) a segmentation fault at me (since I > told it to expect 0 elements, but it got 1). > > e.g. if I run a file using only the first 3 lines: > "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain > GeneListmm9.txt unMapped.txt > Reading liftover chains > Mapping coordinates" > > but if I include line 4 I either get (with blockCount = 0) > > "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain > GeneListmm9.txt unMapped.txt > Reading liftover chains > Mapping coordinates > Segmentation fault" > > or (removing blockCount; if I say 0 blockCount and remove blockSizes > and > blockStarts I also get an error, just "got 10" instead of "got 9") > > "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain > GeneListmm9.txt unMapped.txt > Reading liftover chains > Mapping coordinates > Expecting 12 words line 4 of GeneListmm8_LiftOverThis.txt got 9" > > I suppose I could do this by splitting it into two files; one with > genes > with exons, the other with genes without, but that seems clumsy. Am I > missing something or doing something wrong, or is this a bug in > liftOver? > > > Thank you. > Evan Williams > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
