Dear Daniel,

On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:

I am attaching a short anonymized sample file (would a larger data set be helpful?) that 
illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable 
field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the 
annotation fields appear structured, that part of the information is not reliable.

Thanks I'll take a look.

We usually index an "access number" in addition to the identifier. Is there some significance in the parts of the id naming that could be used as an accession or a sequence version?

As for the name, how about something like "iguspto"?

Thanks. I may just use USPTO but it's not important.

Lastly, do you think the patch with this change would be made available for 
EMBOSS 6.4?

Yes ... it is a fairly straightforward extension to dbxflat so I could send you a copy but for general release I would prefer to distribute it only from 6.5 onwards.

regards,

Peter Rice
EMBOSS Team
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to