Dear Daniel,
On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:
I am attaching a short anonymized sample file (would a larger data set be helpful?) that
illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable
field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the
annotation fields appear structured, that part of the information is not reliable.
Thanks I'll take a look.
We usually index an "access number" in addition to the identifier. Is
there some significance in the parts of the id naming that could be used
as an accession or a sequence version?
As for the name, how about something like "iguspto"?
Thanks. I may just use USPTO but it's not important.
Lastly, do you think the patch with this change would be made available for
EMBOSS 6.4?
Yes ... it is a fairly straightforward extension to dbxflat so I could
send you a copy but for general release I would prefer to distribute it
only from 6.5 onwards.
regards,
Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss