Hi Evan,

The BED12 format has no concept of "0 exons".  If an item has a start
and an end, then one or more blocks must span the start and end.  If
there is no coding/thick region, then thickStart = thickEnd.  Also,
"=" is not an expected strand character -- try +, -, or . if unknown.

This line of BED will result in a thin span (no thick area because
thickStart=thickEnd):

chr1 72360320 72469772 March4 0 . 72360320 72360320 0 1 109452, 0,

BlockStarts are relative to chromStart, so you'll need to subtract
chromStart from each blockStarts value below.  The first blockStarts
value is always 0.  

For March4 in particular -- I typed "March4" in the position/search 
box of the mm9 (July 2007) genome browser, and there is a match in the 
UCSC genes track at chr1:72,473,686-72,583,138 that includes exons and 
coding region.  If you have 42000 identifiers that should match the 
UCSC Genes track items then you may be able to use the Table Browser 
to retrieve mm9 coordinates for those (at least as a point of 
comparison or source of missing exon/CDS coords).  If you would like 
to try that or have any other questions, please let us know at 
[email protected].

Hope that helps,
Angie


> Date: Wed, 1 Jul 2009 12:46:29 -0500
> From: evan williams <[email protected]>
> To: [email protected]
> Subject: [Genome] LiftOver BED Format Catch-22 Error
> 
> Hi,
> 
> I'm using the LiftOver utility to update a list of gene and exon
> locations from mm8 to mm9, using the 'mm8ToMm9.over.chain' file. My
> problem is with the BED format and liftOver. I have a text file with
> 42000 rows that looks like so:
> 
> 
> chr8 68547513 69399414 March1 0 = 68896145 69397519 0 8 
> 239,77,209,51,80,181,148,2194, 
> 68547513,68808318,68896035,69205650,69315428,69347804,69385041,69397220
> chr17 33292398 33325385 March2 0 = 33294782 33316635 0 6 
> 2531,135,210,196,228,125, 
> 33292398,33298704,33302721,33309785,33316459,33325260
> chr18 56887084 57050917 March3 0 = 56887883 56937388 0 5 853,210,205,244,242, 
> 56887084,56908387,56933032,56937200,57050675
> chr1 72360320 72469772 March4 0 = 0 0 0 0 0, 0,
> chr19 37272964 37287281 March5 0 = 37273013 37286650 0 6 
> 84,203,131,184,167,748, 37272964,37275814,37282326,37285446,37285782,37286533,
> 
> Using all 12 of the parameters allowed in the BED file format (it's
> ordered properly; http://genome.ucsc.edu/FAQ/FAQformat#format1). The
> problem lies in the blockCounts. If I have zero known exons, as is
> the case in the gene March4, the program throws me a Catch-22. I am
> required to put "0" in the blockCount when there are no exons; the
> program breaks if I leave it as null. However, if I put "0 0, 0,"
> the program throws (as expected) a segmentation fault at me (since I
> told it to expect 0 elements, but it got 1).
> 
> e.g. if I run a file using only the first 3 lines:
> "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain 
> GeneListmm9.txt unMapped.txt
> Reading liftover chains
> Mapping coordinates"
> 
> but if I include line 4 I either get (with blockCount = 0)
> 
> "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain
> GeneListmm9.txt unMapped.txt
> Reading liftover chains
> Mapping coordinates
> Segmentation fault"
> 
> or (removing blockCount; if I say 0 blockCount and remove blockSizes
> and
> blockStarts I also get an error, just "got 10" instead of "got 9")
> 
> "$ ./liftOver GeneListmm8_LiftOverThis.txt mm8ToMm9.over.chain
> GeneListmm9.txt unMapped.txt
> Reading liftover chains
> Mapping coordinates
> Expecting 12 words line 4 of GeneListmm8_LiftOverThis.txt got 9"
> 
> I suppose I could do this by splitting it into two files; one with
> genes
> with exons, the other with genes without, but that seems clumsy. Am I
> missing something or doing something wrong, or is this a bug in
> liftOver?
> 
> 
> Thank you.
> Evan Williams
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to