Hi Peera,
I downloaded the file, stripped off extra comment lines (extra two at
top starting with "#!" and one at bottom "##"). I loaded this to Galaxy
as text, and when I attempted to set datatype as GFF3 ran into the
metadata issues.
This links at GMOD have a GFF3 format specification:
http://gmod.org/wiki/GFF#GFF3_Format
Bringing the data into spec will be the only solution if you want to use
it. While simple format errors could be corrected by working with the
file in tabular format in Galaxy, more complex errors will likely need
to be fixed before upload into Galaxy.
The GMOD validation tool can help pinpoint the errors. Enter the ftp URL
into the form. When I ran, the errors seem to be with the "type"
keywords used (do not meet spec):
http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
Line Number Error/Warning
----------- -------------
4 [WARNING] unknown directive (directive: ##Type DNA NC_010609.1)
5 [ERROR] invalid type (type: source)
10 [ERROR] invalid type (type: misc_feature)
11 [ERROR] invalid type (type: misc_feature)
12 [ERROR] invalid type (type: misc_feature)
13 [ERROR] invalid type (type: misc_feature)
14 [ERROR] invalid type (type: misc_feature)
15 [ERROR] invalid type (type: misc_feature)
16 [ERROR] invalid type (type: misc_feature)
17 [ERROR] invalid type (type: misc_feature)
... 158 pages of errors...
If you have a history with a GFF3 file from the bioperl program (the one
you used and Peter suggested) that you believe to produce a file in spec
(does not have the above content/errors) and verified by passing the
above validation test, and is still giving errors with Cufflinks, there
could be another problem. A chromosome naming mismatch between the
reference genome and reference annotation is a common problem that you
can examined first (all chromosome identifiers between BAM/SAM results,
GTF/GFF3 annotation, and the reference genome must be identical). If
that checks out, then please send a bug report from that failed
Cufflinks job (green bug icon) and note in the comments that that bug
report is from you, if your Galaxy account has a different email address
than the one used for this email. We can help rule out other types of
problems that are common with this tool set.
Hopefully this helps, but if not, we can work with your bug report,
Best,
Jen
Galaxy team
On 3/4/12 10:34 AM, Hemarajata, Peera wrote:
Dear all,
I’m been trying to get Galaxy to recognize this GFF from NCBI (
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Lactobacillus_reuteri_JCM_1112_uid58875/NC_010609.gff)
but it failed to recognize the format after I uploaded it. Manual
setting didn’t work either because it gave me a “unable to set metadata”
error to me as soon as I started a cufflinks run using that GFF. I have
tried to reformat the file several times and even tried using the
popular bp_genbank2gff3.pl script to re-parse the records from the
original genbank file.
Would anyone kindly look at the NCBI GFF and guide me to a solution to
get this file recognized by Galaxy? I’ve been stuck for a couple of
weeks now and would appreciate some suggestions. Thank you!
Sincerely yours,
Peera Hemarajata, M.D.
Advanced graduate student - Versalovic lab
Department of Molecular Virology and Microbiology - Baylor College of
Medicine
Department of Pathology - Texas Children's Hospital
Suite 830, 8th Floor Feigin Center. Tel: 832-824-8245
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/