Tim, Keith,
Thanks for the responses on this. Just to add, knowing ahead of time
exactly which characters are escaped for GFF3 helps, particularly when
the generated GFF3 is used for other purposes (for instance when
working with Bioperl/GBrowse, both which have the specific encoding
character set hard-coded to match the latest GF3 spec, with no URI-
encoding of spaces).
I'm fine if this follows a particular GFF3 revision; I always can
bring this up with my fellow bioperl brethren and build in some
flexibility in the GFF3 parsers for encoding/decoding issues), but I
could find very little re: GFF3 in the Artemis docs, and nothing
specifically regarding the spec version of GFF3 supported or encoding/
decoding (the GFF link in the V10 manual points to the current GFF3
spec, not the older one, hence my confusion). If this was added in
somewhere it would help tremendously.
chris
On May 27, 2008, at 4:42 AM, Tim Carver wrote:
Hi Chris, Keith
Thanks for that. It does sound like Keith is correct and that spaces
are
optionally encoded within fields. However, at the moment Artemis isn't
encoding new lines, carriage returns and control characters so those
will be
added.
Regards
Tim
On 27/5/08 09:57, "Keith James" <[EMAIL PROTECTED]> wrote:
"Chris" == Chris Fields <[EMAIL PROTECTED]> writes:
Chris> I have noticed that Artemis is URI-encoding and decoding
Chris> spaces. The GFF3 spec
Chris> (http://www.sequenceontology.org/gff3.shtml) indicates that
Chris> tab, newline, carriage return, control chars, and ';=%&,'
Chris> are encoded; spaces, quotes, etc. aren't encoded.
The escaping rules have changed over time. If you look at the GFF3
spec CVS, the rules were relaxed between versions 1.03 and
1.06.
Certainly spaces must be escaped where they are in the seqid
column. The current spec also states that "unescaped spaces are
allowed within fields". I interpret "allowed" to mean "may be
unescaped" i.e. optionally unescaped. This would make sense because
it's backwardly compatible with pre-relaxation GFF documents already
in circulation.
Chris> Is there a way to indicate which characters should be
Chris> encoded/decoded for Artemis, or is it possible to change
Chris> this to hew closer to the specification?
My reading of the spec is that the current Artemis behaviour is
correct wrt pre and post v 1.06 specs. The issue is that it's writing
the pre v 1.06 format, which I think is allowed because older formats
do not seem to be deprecated.
_______________________________________________
Artemis-users mailing list
[email protected]
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
_______________________________________________
Artemis-users mailing list
[email protected]
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users