On 19 Dec 2007, at 21:14, Richard Holland wrote:
By far the easiest way to create a BioMart dataset is to use
MartBuilder,
point it at your relational database containing your data, and go from
there. MartBuilder needs to be told which table is the main focus of
your
dataset - in your case, probably the gene table - and it will then
suggest
dimensions etc. and generate the SQL for you to run to build the mart.
Having said that, if your data is in a GMOD-style schema with one table
representing multiple different data types (gene, transcript, etc.)
then
it does get a bit harder. Also it gets harder if you want to produce
sequences, as this requires use of the (undocumented) GenomicSequence
modules. If you run into this situation, drop us another email and
we'll
do our best to help.
For now, the Ensembl datasets on martdb.ensembl.org port 5316 would be
a
good starting point - you can compare the tables in the databases with
the
configs (by pointing MartEditor at it) and see how they match up.
Hi Amir
if you want a really quick, basic start try to make one table in
either
mysql, postgress or oracle with at least one row of data in it and use
MEditor
to create and export 'naive' configuration. That's it. You can start
making more sophisticated things from there.
if you start from relational schema then as Richards suggests the best
thing is to use MBuilder for schema transformation.
if you start from gff use gmod gff tools
if you want Ensembl-like functionality try a stripped down version of
ensembl mart which would include
gene, transcript, structure and sequence table for one ensembl species.
if you want more details drop us an email and we'll try to help you as
much as we can with the less than optimal
documented bits of the procedure :)
a.
cheers,
Richard
On Wed, December 19, 2007 8:45 pm, Amir Karger wrote:
Hi.
I'd like to make a Biomart site with some fungal genomes.
I saw lots of information of various types in the User Guide, but not
too much on how to build the tables in the first place. That is, if I
want a stripped-down Biomart (get gene locations, descs, etc.,
download
sequences of genes possibly with upstream/downstream sequence), then
which Biomart tables do I need to produce?
Looking through the mailing list, I saw the gff2biomart5.pl script
from
gmod. Is that a good tool to use?
If someone can point me to an example simple Biomart site that I could
steal from, that would be great. I know enough Perl to create files in
whatever format is necessary; I just need to know what content and
format I need to get Biomart working.
Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University
--
Richard Holland
BioMart (http://www.biomart.org/)
EMBL-EBI
Hinxton, Cambridgeshire CB10 1SD, UK
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------