Dear Peter, This is most wonderful news that's going to make a bunch of users really happy!
I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable. As for the name, how about something like "iguspto"? Lastly, do you think the patch with this change would be made available for EMBOSS 6.4? With gratitude, Daniel -- Daniel Rozenbaum Biocceleration, Inc. OCIO/ Office of Application Engineering & Development/ Patent System Division 600 Dulany St. Alexandria, VA 22314 -----Original Message----- From: Peter Rice [mailto:ricepet...@yahoo.co.uk] Sent: Wednesday, September 19, 2012 6:48 AM To: Rozenbaum, Daniel (Biocceleration Inc) Cc: emboss@lists.open-bio.org Subject: Re: [EMBOSS] Support for multi-line annotation in ig format Dear Daniel, On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Greetings again, > > If I may, another question on the issue of IG format: how difficult would it > be to support database indexing for this format? Very easy, a 1-day job including testing and documentation. Could you please make some example data available, and indicate which fields could be indexed (including any information in formatted descriptions or in naming conventions), and suggest a format name (e.g. USPTO or Biocceleration) regards, Peter Rice EMBOSS Team
; Sequence 1, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 1 ; LENGTH: 178 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-1 HGQGMHKIEAPCGQMFRCTMVKFSDDYNEPIALKIRYARPGTCWYAMVVCEQMVPWISWT LALTRVAGQVRDSPPFWAWYCEKMQANKPMPWRQTWVAHYAWPENWMNPYNVFGKCHKTD LGRCWQWWKDITEQLTVCHWMDWGIACQDCLEKTKHGLCHSRAQIMHCGHGGVTGKHA1 ; Sequence 2, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 2 ; LENGTH: 500 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-2 KTLNSGAQIALVMTNASRGLPQTSRVLDYREVNRTDSGNYHGDSYRYHEHRVKYESMNKM CNTLLAFCRPKKMQNTARWHRVDLCMQEYCACPRMFCTVQTHMPWFRSDVGPPWFAARTN PECSIVDGAVGRKCHEPTTNEVAGCRFECGPVSHEDPIMKWHAVTGHQRSMILILLGPRQ CGKTTSEIWCHYVHDWAHMQHVTYYTVVDEERMNAFANKNHTNVCKYHPSMLHCVHRLSP HPPVEYNLKNLKITYMPPNSISNPGITLDNTCLQTACLGSHYSWVMVEMYTRNCYRAPAY NKAQNSDTWGIQTAVHTANGHEANQEVCIAIIFIGFWAYKHDVWHMTVDEVDGYMPDESV NGDGGPKKYIEFKCQYWTGFDYDAIGIHVLTRFFRWYEFCLRWQHGKAHIHAPCRDTGHG ANTLAKAESNPFGAAQSALGWLMDNLCKYLMCNRCAQLNASHWTFWTNPMDQWMCGMLDI CRPPMLRKGPISDESHTFTD1 ; Sequence 3, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 3 ; LENGTH: 61 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-3 GVSGANWCNNEWFNARSGWPAPICTGRFPKVSAYCRLVVMWYAKTFFRYEFAFVHKRTGP M1 ; Sequence 4, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 4 ; LENGTH: 10 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-4 YDAIGIHVLT1
_______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss