Dear Peter,

At least within the context of USPTO the sequence identifier is the only 
consistently present piece of information that uniquely identifies the 
sequence. Does the absence of an accession number field make the task of adding 
support for this in EMBOSS more complex?

Thank you,
Daniel

On Sep 19, 2012, at 11:14 AM, "Peter Rice" <ricepet...@yahoo.co.uk> wrote:

> Dear Daniel,
> 
> On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> 
>> I am attaching a short anonymized sample file (would a larger data set be 
>> helpful?) that illustrates the type of IG format in use at USPTO. I believe 
>> that the only reasonably indexable field is the sequence name 
>> ("US-123456789-1", "US-123456789-2", etc). While the annotation fields 
>> appear structured, that part of the information is not reliable.
> 
> Thanks I'll take a look.
> 
> We usually index an "access number" in addition to the identifier. Is 
> there some significance in the parts of the id naming that could be used 
> as an accession or a sequence version?
> 
>> As for the name, how about something like "iguspto"?
> 
> Thanks. I may just use USPTO but it's not important.
> 
>> Lastly, do you think the patch with this change would be made available for 
>> EMBOSS 6.4?
> 
> Yes ... it is a fairly straightforward extension to dbxflat so I could 
> send you a copy but for general release I would prefer to distribute  it 
> only from 6.5 onwards.
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 

_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to