Gentlemen, thanks for your comments, it has been very helpful ! A thought though : is the lincRNA track taking into consideration also the lincRNA that were published in the literature up to now ? I have also noticed a database on lncRNAs: http://www.lncrnadb.org, although the set of lncRNAs there looks rather small at this moment. thanks very much,
bogdan ========================= Bogdan Tanasa, MD TSRI/HHMI [email protected] On Fri, Dec 3, 2010 at 3:17 AM, Ewan Birney <[email protected]> wrote: > On Fri, 3 Dec 2010, Maximilian Haussler wrote: > > Very interesting thread! >> >> Bogdan, if you want to combine the data from the two URLs that Ewan sent >> you, be aware that UCSC is at Version 59 of Ensembl and the Biomart link >> points to version 60 of Biomart, so if Ensembl has changed anything from >> version 59 to version 60 for the human assembly (don't know how to find >> this >> info on the web at the moment), then you might want to use the Version 59 >> Biomart at >> http://aug2010.archive.ensembl.org/biomart/martview/ >> >> You just select the checkboxes Attributes / Biotype, Chrom, Start, End and >> click on output to get the lincRNA coordinates. >> >> > It's always best to stay synchronised on the same release :) > > Human does tend to click over a little bit each release because of updates > from Havana moving in (though not necessarily each release). > > One way to track this is the database extension name which changes > when the database contents change: > > (this is given as <<global_release>>.<<species_specific>> > The species specific is usually assemblynumber<<letter>> where letter > updates on database content change on the same database) > > > release 60: 60.37e > release 59: 59.37d > > (so - as 37e != 37d, there has been some content change) > > You can get this from the assembly and stats table at: > > http://www.ensembl.org/Homo_sapiens/Info/StatsTable?db=core > > and the archive site for 59 release (each page in ensembl is linked > to their archives at the bottem of the page) > > http://aug2010.archive.ensembl.org/Homo_sapiens/Info/StatsTable?db=core > > There is actually even more granularity on whether the content change > was just Xref or Gene Build as well... but I can't spot that. > > > > Note that the coordinates from Ensembl and UCSC are not completely >> compatible: You will need to remove all features on chromosome HSCHR6_* or >> on chromosome "LRG" (grep -v), prefix all chromosome numbers with "chr" >> (Excel, gawk, perl) and reorder the columns to get them into GFF or BED >> format. >> >> > We really must make this easier in the future. So silly to have these > issues. Something for a deeper conversation than this. > > > If you switch on the biotype to lincRNA, you automatically don't get > LRG's (arguably LRGs should not be coming out in biomart, but arguably > they should... hmmm....) > > I think there are other haplotypes than HSCHR6_* right - there is one > on CHR17 I think, so I am not sure that grep does it all. > grep -v HSCHR I think. > > > > <http://aug2010.archive.ensembl.org/biomart/martview/>cheers >> >> Max >> -- >> Maximilian Haussler >> Tel: +447574246789 >> http://www.manchester.ac.uk/research/maximilian.haussler/ >> >> >> On Thu, Dec 2, 2010 at 10:17 AM, Ewan Birney <[email protected]> wrote: >> >> >>> >>> The Ensembl project explicit aims to predict long intergenic non >>> coding RNAs >>> (lincRNAs) using a similar scheme (ie, histone modification patterns) >>> and >>> ESTs/cDNAs without coding potential in both Human and Mouse. They are >>> explicitly >>> characterised as lincRNAs. Like all our "predictions", they are biased >>> towards >>> a high specificity set and backed up by experimental evidence. >>> >>> An example one is here: >>> >>> >>> >>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000245883;r=7:99517494-99522910;t=ENST00000499990 >>> >>> >>> Looking into the corresponding import of Ensembl into UCSC here: >>> >>> >>> >>> http://genome.ucsc.edu/cgi-bin/hgc?hgsid=173968291&o=99517493&t=99522910&g=ensGene&i=ENST00000499990 >>> >>> This transcript is there, but I can't spot the "biotype" slot here - >>> it is just >>> that it is non coding (we have about ~20 other non coding biotypes, >>> eg, snoRNAs, >>> miRNAs etc) >>> >>> >>> >>> (Is this true - UCSC guys, would it be possible to get the concept of >>> BioType in >>> the Ensembl set?) >>> >>> >>> Also the Havana project, which does manual curation, which is both >>> merged in a principled >>> way with the Ensembl set (ie, the Ensembl set is a super-set of Havana >>> at the point of >>> release) and is available in UCSC browser also has a large set of non >>> coding RNAs. >>> >>> >>> A count of lincRNAs in Human and Mouse in Ensembl are: >>> >>> 1443 - in Human >>> >>> 407 - in Mouse. >>> >>> >>> It is probably possible to either download from UCSC and the biotypes >>> from Ensembl with >>> a script to join or of course download the set from ensembl. You might >>> like to use >>> our BioMart tool: >>> >>> (showing our west coast mirror here) >>> >>> http://uswest.ensembl.org/biomart/martview/ >>> >>> >>> >>> >>> On 2 Dec 2010, at 07:47, Bogdan Tanasa wrote: >>> >>> Dear all, >>>> >>>> please could you recommend a track "Genes and Gene Prediction >>>> Tracks" that >>>> has the highest number (with good accuracy) of known/ predicted long >>>> ncRNAs >>>> (lincRNAs, etc) ? >>>> >>>> thanks, >>>> >>>> Bogdan >>>> _______________________________________________ >>>> Genome maillist - [email protected] >>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>>> >>> >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> >>> >> > ----------------------------------------------------------------- > Ewan Birney. Work: +44 1223 494420 > Email: birney "at" ebi.ac.uk > Clerical Assistant: shelley "at" ebi.ac.uk > Please cc shelley for urgent or diary-dependent requests > ----------------------------------------------------------------- > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
