Hi,
a few weeks ago I was attending an Open Door Workshop at the Sanger. I
had occasion to speak to one of your team and mention a couple of
problems we regularly encounter when using biomart. I was advised to
post to this address.
I, and my colleagues, use biomart to output gene related information
for lists of microarray feature IDs. Even though we untick the ensembl
transcript ID box we still get an output for each transcript. In some
cases, where genes have 9 documented transcripts we get 9 perfectly
replicated entries. When dealing with lists of over a thousand genes
each time this gets very confusing and generally makes excel stop
responding!
We wonder if in future re-works of the tool a gene specific rather than
a transcript specific output can be made available. We are aware that
for people working on only one, or a handful of genes, getting all the
transcript specific information is essential. However, it would make
life a lot easier for scientists like us who handle large gene lists
if we could specifically select to obtain only gene specific outputs, 1
gene = 1 row of output.
Our second major problem stems from the fact that sometimes there is no
information linked to particular microarray feature IDs. The count tab
tells you how many out of your list were found but there is no
information whatsoever about the ones that were not found. Manually
finding which 50 out of a list of 1000 were not found is not easy. An
output list of features not found, or inclusion of the not found items
within the output with a short 'not found' comment next to them would
be very useful.
In summary, for us the ideal situation would be if we could input a
list of 1000 feature IDs and as output get a list of 1000 rows, 1 gene
per row, in the same sequence as the input list, with either empty
cells or a not found comment against those not found.
Besides this particular feature, biomart is great and has made data
mining of large data sets so much more accessible!
Thank you.
Regards
Rosienne
_______________________________________________________
Rosienne Farrugia
Division of Transfusion Medicine
Department of Haematology
University of Cambridge
Long Road
Cambridge
CB2 2PT
Tele: 01223 548008
Fax: 01223 548136