Dear Peter,

This is most wonderful news that's going to make a bunch of users really happy!

I am attaching a short anonymized sample file (would a larger data set be 
helpful?) that illustrates the type of IG format in use at USPTO. I believe 
that the only reasonably indexable field is the sequence name 
("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear 
structured, that part of the information is not reliable. 

As for the name, how about something like "iguspto"?

Lastly, do you think the patch with this change would be made available for 
EMBOSS 6.4? 

With gratitude,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

-----Original Message-----
From: Peter Rice [mailto:ricepet...@yahoo.co.uk] 
Sent: Wednesday, September 19, 2012 6:48 AM
To: Rozenbaum, Daniel (Biocceleration Inc)
Cc: emboss@lists.open-bio.org
Subject: Re: [EMBOSS] Support for multi-line annotation in ig format

Dear Daniel,

On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> Greetings again,
>
> If I may, another question on the issue of IG format: how difficult would it 
> be to support database indexing for this format?

Very easy, a 1-day job including testing and documentation.

Could you please make some example data available, and indicate which fields 
could be indexed (including any information in formatted descriptions or in 
naming conventions), and suggest a format name (e.g. 
USPTO or Biocceleration)

regards,

Peter Rice
EMBOSS Team


; Sequence 1, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 1
;  LENGTH: 178
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-1
HGQGMHKIEAPCGQMFRCTMVKFSDDYNEPIALKIRYARPGTCWYAMVVCEQMVPWISWT
LALTRVAGQVRDSPPFWAWYCEKMQANKPMPWRQTWVAHYAWPENWMNPYNVFGKCHKTD
LGRCWQWWKDITEQLTVCHWMDWGIACQDCLEKTKHGLCHSRAQIMHCGHGGVTGKHA1


; Sequence 2, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 2
;  LENGTH: 500
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-2
KTLNSGAQIALVMTNASRGLPQTSRVLDYREVNRTDSGNYHGDSYRYHEHRVKYESMNKM
CNTLLAFCRPKKMQNTARWHRVDLCMQEYCACPRMFCTVQTHMPWFRSDVGPPWFAARTN
PECSIVDGAVGRKCHEPTTNEVAGCRFECGPVSHEDPIMKWHAVTGHQRSMILILLGPRQ
CGKTTSEIWCHYVHDWAHMQHVTYYTVVDEERMNAFANKNHTNVCKYHPSMLHCVHRLSP
HPPVEYNLKNLKITYMPPNSISNPGITLDNTCLQTACLGSHYSWVMVEMYTRNCYRAPAY
NKAQNSDTWGIQTAVHTANGHEANQEVCIAIIFIGFWAYKHDVWHMTVDEVDGYMPDESV
NGDGGPKKYIEFKCQYWTGFDYDAIGIHVLTRFFRWYEFCLRWQHGKAHIHAPCRDTGHG
ANTLAKAESNPFGAAQSALGWLMDNLCKYLMCNRCAQLNASHWTFWTNPMDQWMCGMLDI
CRPPMLRKGPISDESHTFTD1


; Sequence 3, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 3
;  LENGTH: 61
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-3
GVSGANWCNNEWFNARSGWPAPICTGRFPKVSAYCRLVVMWYAKTFFRYEFAFVHKRTGP
M1


; Sequence 4, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 4
;  LENGTH: 10
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-4
YDAIGIHVLT1
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to