Gentlemen, thanks for your comments, it has been very helpful ! A thought
though :
is the lincRNA track taking into consideration also the lincRNA that were
published in the literature up to now ? I have also noticed a database on
lncRNAs: http://www.lncrnadb.org, although the set of lncRNAs there looks
rather small at this moment. thanks very much,

bogdan

=========================
Bogdan Tanasa, MD
TSRI/HHMI
[email protected]


On Fri, Dec 3, 2010 at 3:17 AM, Ewan Birney <[email protected]> wrote:

> On Fri, 3 Dec 2010, Maximilian Haussler wrote:
>
>  Very interesting thread!
>>
>> Bogdan, if you want to combine the data from the two URLs that Ewan sent
>> you, be aware that UCSC is at Version 59 of Ensembl and the Biomart link
>> points to version 60 of Biomart, so if Ensembl has changed anything from
>> version 59 to version 60 for the human assembly (don't know how to find
>> this
>> info on the web at the moment), then you might want to use the Version 59
>> Biomart at
>> http://aug2010.archive.ensembl.org/biomart/martview/
>>
>> You just select the checkboxes Attributes / Biotype, Chrom, Start, End and
>> click on output to get the lincRNA coordinates.
>>
>>
> It's always best to stay synchronised on the same release :)
>
> Human does tend to click over a little bit each release because of updates
> from Havana moving in (though not necessarily each release).
>
> One way to track this is the database extension name which changes
> when the database contents change:
>
>  (this is given as <<global_release>>.<<species_specific>>
> The species specific is usually assemblynumber<<letter>> where letter
> updates on database content change on the same database)
>
>
>   release 60: 60.37e
>   release 59: 59.37d
>
> (so - as 37e != 37d, there has been some content change)
>
> You can get this from the assembly and stats table at:
>
>   http://www.ensembl.org/Homo_sapiens/Info/StatsTable?db=core
>
> and the archive site for 59 release (each page in ensembl is linked
> to their archives at the bottem of the page)
>
>   http://aug2010.archive.ensembl.org/Homo_sapiens/Info/StatsTable?db=core
>
> There is actually even more granularity on whether the content change
> was just Xref or Gene Build as well... but I can't spot that.
>
>
>
>  Note that the coordinates from Ensembl and UCSC are not completely
>> compatible: You will need to remove all features on chromosome HSCHR6_* or
>> on chromosome "LRG" (grep -v), prefix all chromosome numbers with "chr"
>> (Excel, gawk, perl) and reorder the columns to get them into GFF or BED
>> format.
>>
>>
> We really must make this easier in the future. So silly to have these
> issues. Something for a deeper conversation than this.
>
>
> If you switch on the biotype to lincRNA, you automatically don't get
> LRG's (arguably LRGs should not be coming out in biomart, but arguably
> they should... hmmm....)
>
> I think there are other haplotypes than HSCHR6_* right - there is one
> on CHR17 I think, so I am not sure that grep does it all.
> grep -v HSCHR I think.
>
>
>
>  <http://aug2010.archive.ensembl.org/biomart/martview/>cheers
>>
>> Max
>> --
>> Maximilian Haussler
>> Tel: +447574246789
>> http://www.manchester.ac.uk/research/maximilian.haussler/
>>
>>
>> On Thu, Dec 2, 2010 at 10:17 AM, Ewan Birney <[email protected]> wrote:
>>
>>
>>>
>>> The Ensembl project explicit aims to predict long intergenic non
>>> coding RNAs
>>> (lincRNAs) using a similar scheme (ie, histone modification patterns)
>>> and
>>> ESTs/cDNAs without coding potential in both Human and Mouse. They are
>>> explicitly
>>> characterised as lincRNAs. Like all our "predictions", they are biased
>>> towards
>>> a high specificity set and backed up by experimental evidence.
>>>
>>> An example one is here:
>>>
>>>
>>>
>>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000245883;r=7:99517494-99522910;t=ENST00000499990
>>>
>>>
>>> Looking into the corresponding import of Ensembl into UCSC here:
>>>
>>>
>>>
>>> http://genome.ucsc.edu/cgi-bin/hgc?hgsid=173968291&o=99517493&t=99522910&g=ensGene&i=ENST00000499990
>>>
>>> This transcript is there, but I can't spot the "biotype" slot here -
>>> it is just
>>> that it is non coding (we have about ~20 other non coding biotypes,
>>> eg, snoRNAs,
>>> miRNAs etc)
>>>
>>>
>>>
>>> (Is this true - UCSC guys, would it be possible to get the concept of
>>> BioType in
>>> the Ensembl set?)
>>>
>>>
>>> Also the Havana project, which does manual curation, which is both
>>> merged in a principled
>>> way with the Ensembl set (ie, the Ensembl set is a super-set of Havana
>>> at the point of
>>> release) and is available in UCSC browser also has a large set of non
>>> coding RNAs.
>>>
>>>
>>> A count of lincRNAs in Human and Mouse in Ensembl are:
>>>
>>>   1443 - in Human
>>>
>>>   407 - in Mouse.
>>>
>>>
>>> It is probably possible to either download from UCSC and the biotypes
>>> from Ensembl with
>>> a script to join or of course download the set from ensembl. You might
>>> like to use
>>> our BioMart tool:
>>>
>>> (showing our west coast mirror here)
>>>
>>> http://uswest.ensembl.org/biomart/martview/
>>>
>>>
>>>
>>>
>>> On 2 Dec 2010, at 07:47, Bogdan Tanasa wrote:
>>>
>>>  Dear all,
>>>>
>>>> please could you recommend a track "Genes and Gene Prediction
>>>> Tracks" that
>>>> has the highest number (with good accuracy) of known/ predicted long
>>>> ncRNAs
>>>> (lincRNAs, etc) ?
>>>>
>>>> thanks,
>>>>
>>>> Bogdan
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>>>
>>
> -----------------------------------------------------------------
> Ewan Birney.  Work:  +44 1223 494420
>             Email:  birney "at" ebi.ac.uk
> Clerical Assistant:  shelley "at" ebi.ac.uk
> Please cc shelley for urgent or diary-dependent requests
> -----------------------------------------------------------------
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to