I'm still getting an empty array back from this: Note [] myAccs = ((RichAnnotation)rs.getAnnotation()).getProperties( INSDseqFormat.Terms.getOtherSeqIdTerm());
Here's the file that I'm parsing: ~~~~~~~~~~~~~~~~~~~~~~ <?xml version="1.0"?> <!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" " http://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd"> <INSDSet> <INSDSeq> <INSDSeq_locus>AY069118</INSDSeq_locus> <INSDSeq_length>1502</INSDSeq_length> <INSDSeq_strandedness>single</INSDSeq_strandedness> <INSDSeq_moltype>mRNA</INSDSeq_moltype> <INSDSeq_topology>linear</INSDSeq_topology> <INSDSeq_division>INV</INSDSeq_division> <INSDSeq_update-date>17-DEC-2001</INSDSeq_update-date> <INSDSeq_create-date>15-DEC-2001</INSDSeq_create-date> <INSDSeq_definition>Drosophila melanogaster GH13089 full length cDNA</INSDSeq_definition> <INSDSeq_primary-accession>AY069118</INSDSeq_primary-accession> <INSDSeq_accession-version>AY069118.1</INSDSeq_accession-version> <INSDSeq_other-seqids> <INSDSeqid>gb|AY069118.1|</INSDSeqid> <INSDSeqid>gi|17861571</INSDSeqid> </INSDSeq_other-seqids> <INSDSeq_keywords> <INSDKeyword>FLI_CDNA</INSDKeyword> </INSDSeq_keywords> <INSDSeq_source>Drosophila melanogaster (fruit fly)</INSDSeq_source> <INSDSeq_organism>Drosophila melanogaster</INSDSeq_organism> <INSDSeq_taxonomy>Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila</INSDSeq_taxonomy> <INSDSeq_references> <INSDReference> <INSDReference_reference>1 (bases 1 to 1502)</INSDReference_reference> <INSDReference_position>1..1502</INSDReference_position> <INSDReference_authors> <INSDAuthor>Stapleton,M.</INSDAuthor> <INSDAuthor>Brokstein,P.</INSDAuthor> <INSDAuthor>Hong,L.</INSDAuthor> <INSDAuthor>Agbayani,A.</INSDAuthor> <INSDAuthor>Carlson,J.</INSDAuthor> <INSDAuthor>Champe,M.</INSDAuthor> <INSDAuthor>Chavez,C.</INSDAuthor> <INSDAuthor>Dorsett,V.</INSDAuthor> <INSDAuthor>Farfan,D.</INSDAuthor> <INSDAuthor>Frise,E.</INSDAuthor> <INSDAuthor>George,R.</INSDAuthor> <INSDAuthor>Gonzalez,M.</INSDAuthor> <INSDAuthor>Guarin,H.</INSDAuthor> <INSDAuthor>Li,P.</INSDAuthor> <INSDAuthor>Liao,G.</INSDAuthor> <INSDAuthor>Miranda,A.</INSDAuthor> <INSDAuthor>Mungall,C.J.</INSDAuthor> <INSDAuthor>Nunoo,J.</INSDAuthor> <INSDAuthor>Pacleb,J.</INSDAuthor> <INSDAuthor>Paragas,V.</INSDAuthor> <INSDAuthor>Park,S.</INSDAuthor> <INSDAuthor>Phouanenavong,S.</INSDAuthor> <INSDAuthor>Wan,K.</INSDAuthor> <INSDAuthor>Yu,C.</INSDAuthor> <INSDAuthor>Lewis,S.E.</INSDAuthor> <INSDAuthor>Rubin,G.M.</INSDAuthor> <INSDAuthor>Celniker,S.</INSDAuthor> </INSDReference_authors> <INSDReference_title>Direct Submission</INSDReference_title> <INSDReference_journal>Submitted (10-DEC-2001) Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA</INSDReference_journal> </INSDReference> </INSDSeq_references> <INSDSeq_comment>Sequence submitted by: Berkeley Drosophila Genome Project Lawrence Berkeley National Laboratory Berkeley, CA 94720 This clone was sequenced as part of a high-throughput process to sequence clones from Drosophila Gene Collection 1 (Rubin et al., Science 2000). The sequence has been subjected to integrity checks for sequence accuracy, presence of a polyA tail and contiguity within 100 kb in the genome. Thus we believe the sequence to reflect accurately this particular cDNA clone. However, there are artifacts associated with the generation of cDNA clones that may have not been detected in our initial analyses such as internal priming, priming from contaminating genomic DNA, retained introns due to reverse transcription of unspliced precursor RNAs, and reverse transcriptase errors that result in single base changes. For further information about this sequence, including its location and relationship to other sequences, please visit our Web site (http://fruitfly.berkeley.edu) or send email to [EMAIL PROTECTED]</INSDSeq_comment> <INSDSeq_feature-table> <INSDFeature> <INSDFeature_key>source</INSDFeature_key> <INSDFeature_location>1..1502</INSDFeature_location> <INSDFeature_intervals> <INSDInterval> <INSDInterval_from>1</INSDInterval_from> <INSDInterval_to>1502</INSDInterval_to> <INSDInterval_accession>AY069118.1</INSDInterval_accession> </INSDInterval> </INSDFeature_intervals> <INSDFeature_quals> <INSDQualifier> <INSDQualifier_name>organism</INSDQualifier_name> <INSDQualifier_value>Drosophila melanogaster</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>mol_type</INSDQualifier_name> <INSDQualifier_value>mRNA</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>strain</INSDQualifier_name> <INSDQualifier_value>y; cn bw sp</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>db_xref</INSDQualifier_name> <INSDQualifier_value>taxon:7227</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>map</INSDQualifier_name> <INSDQualifier_value>39B3-39B3</INSDQualifier_value> </INSDQualifier> </INSDFeature_quals> </INSDFeature> <INSDFeature> <INSDFeature_key>gene</INSDFeature_key> <INSDFeature_location>1..1502</INSDFeature_location> <INSDFeature_intervals> <INSDInterval> <INSDInterval_from>1</INSDInterval_from> <INSDInterval_to>1502</INSDInterval_to> <INSDInterval_accession>AY069118.1</INSDInterval_accession> </INSDInterval> </INSDFeature_intervals> <INSDFeature_quals> <INSDQualifier> <INSDQualifier_name>gene</INSDQualifier_name> <INSDQualifier_value>E2f2</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>note</INSDQualifier_name> <INSDQualifier_value>alignment with genomic scaffold AE003669</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>db_xref</INSDQualifier_name> <INSDQualifier_value>FLYBASE:FBgn0024371</INSDQualifier_value> </INSDQualifier> </INSDFeature_quals> </INSDFeature> <INSDFeature> <INSDFeature_key>CDS</INSDFeature_key> <INSDFeature_location>189..1301</INSDFeature_location> <INSDFeature_intervals> <INSDInterval> <INSDInterval_from>189</INSDInterval_from> <INSDInterval_to>1301</INSDInterval_to> <INSDInterval_accession>AY069118.1</INSDInterval_accession> </INSDInterval> </INSDFeature_intervals> <INSDFeature_quals> <INSDQualifier> <INSDQualifier_name>gene</INSDQualifier_name> <INSDQualifier_value>E2f2</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>note</INSDQualifier_name> <INSDQualifier_value>Longest ORF</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>codon_start</INSDQualifier_name> <INSDQualifier_value>1</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>transl_table</INSDQualifier_name> <INSDQualifier_value>1</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>product</INSDQualifier_name> <INSDQualifier_value>GH13089p</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>protein_id</INSDQualifier_name> <INSDQualifier_value>AAL39263.1</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>db_xref</INSDQualifier_name> <INSDQualifier_value>GI:17861572</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>db_xref</INSDQualifier_name> <INSDQualifier_value>FLYBASE:FBgn0024371</INSDQualifier_value> </INSDQualifier> <INSDQualifier> <INSDQualifier_name>translation</INSDQualifier_name> <INSDQualifier_value>MYKRKTASIVKRDSSAAGTTSSAMMMKVDSAETSVRSQSYESTPVSMDTSPDPPTPIKSPSNSQSQSQPGQQRSVGSLVLLTQKFVDLVKANEGSIDLKAATKILDVQKRRIYDITNVLEGIGLIDKGRHCSLVRWRGGGFNNAKDQENYDLARSRTNHLKMLEDDLDRQLEYAQRNLRYVMQDPSNRSYAYVTRDDLLDIFGDDSVFTIPNYDEEVDIKRNHYELAVSLDNGSAIDIRLVTNQGKSTTNPHDVDGFFDYHRLDTPSPSTSSHSSEDGNAPACAGNVITDEHGYSCNPGMKDEMKLLENELTAKIIFQNYLSGHSLRRFYPDDPNLENPPLLQLNPPQEDFNFALKSDEGICELFDVQCS</INSDQualifier_value> </INSDQualifier> </INSDFeature_quals> </INSDFeature> </INSDSeq_feature-table> <INSDSeq_sequence>AAGAATAGAGGGAGAATGAAAAAAATGACATAAATGGCGGAAAGCAAACCTAGCGCCAACATTCGTATTTTCGTTTAATTTTCGCTCCAAAGTGCAATTAATTCCGGCTTCTTGATCGCTGCATATTGAGTGCAGCCACGCAAAGAGTTACAAGGACAGGAGTATAGTCATCGAGTCGATTGCGGACCATGTACAAGCGCAAAACCGCGAGTATTGTTAAAAGAGACAGCTCCGCAGCGGGCACCACCTCCTCGGCTATGATGATGAAGGTGGATTCGGCTGAGACTTCGGTCCGGTCGCAGAGCTACGAGTCTACACCCGTTAGCATGGACACATCACCGGATCCTCCAACGCCAATCAAGTCTCCGTCGAATTCACAATCGCAATCGCAGCCTGGACAACAGCGCTCCGTGGGCTCACTGGTCCTGCTCACACAGAAGTTTGTGGATCTCGTGAAGGCCAACGAAGGATCCATCGACCTGAAAGCGGCAACCAAAATCTTGGACGTACAGAAGCGCCGAATATACGATATTACCAATGTTTTAGAGGGCATTGGACTAATTGATAAGGGCAGACACTGCTCCCTAGTGCGCTGGCGCGGAGGGGGCTTTAACAATGCCAAGGACCAAGAGAACTACGACCTGGCACGTAGCCGGACTAATCATTTGAAAATGTTGGAGGATGACCTAGACAGGCAACTGGAGTATGCACAGCGCAATCTGCGCTACGTTATGCAGGATCCCTCGAATAGGTCGTATGCATATGTGACACGTGATGATCTGCTGGACATCTTTGGAGATGATTCCGTATTCACAATACCTAATTATGACGAGGAAGTAGATATCAAGCGTAATCATTACGAGCTGGCCGTGTCGCTGGACAATGGCAGCGCAATTGACATTCGCCTGGTGACGAACCAAGGAAAGAGTACTACAAATCCGCACGATGTGGATGGGTTCTTTGACTATCAC! CGTCTGGACACGCCCTCACCCTCGACGTCGTCGCACTCCAGCGAGGATGGTAACGCTCCAGCATGCGCGGGGAACGTGATCACCGACGAGCACGGTTACTCGTGCAATCCCGGGATGAAAGATGAGATGAAACTTTTGGAGAACGAGCTGACGGCCAAGATAATCTTCCAAAATTATCTGTCCGGTCATTCGCTGCGGCGATTTTATCCCGATGATCCGAATCTAGAAAACCCGCCGCTGCTGCAGCTGAATCCTCCGCAGGAAGACTTCAACTTTGCGTTAAAAAGCGACGAAGGTATTTGCGAGCTGTTTGATGTTCAGTGCTCCTAACTGTGGAAGGGGATGTACACCTTAGGACTATAGCTACACTGCAACTGGCCGCGTGCATTGTGCAAATATTTATGATTAGTACAATTTTGACTTTGGATTTCTCTATATCGTCTAGAAATTTTTAATTAGTGTAATACCTTGTAATTTCGCAAATAACAGCAAAACCAATAAATTCGTAAATGCAAAAAAAAAAAAAAAAAA</INSDSeq_sequence> </INSDSeq> </INSDSet> ~~~~~~~~~~~~~~~~~~~~~~ On 6/8/06, Richard Holland <[EMAIL PROTECTED]> wrote: > > Yesterday I think I said I was going to add other-seqids but I forgot to > do it, so I did it just now. Try it and see. Use the new > INSDseqFormat.Terms.getOtherSeqIdTerm() term to find them. > > cheers, > Richard > > On Wed, 2006-06-07 at 19:48 -0400, Seth Johnson wrote: > > Hi Richard, > > > > I still cannot locate the GI number for the main sequence. After I > > parse it with readINSDseqDNA, I then use: > > > > Note [] myAccs = ((RichAnnotation)rs.getAnnotation > > ()).getProperties(Terms.getAdditionalAccessionTerm ()); > > > > However, the 'myAccs' appears to be empty. Am I on the wrong track to > > get to other-seqids??? > > > > On 6/6/06, Richard Holland <[EMAIL PROTECTED]> wrote: > > GenBank has a separate line for GI number, so it can be parsed > > out > > nicely. INSDseq does not, so you have to rely on the other- > > seqids tag > > and hope that one of them is the GI number. However it seems I > > have not > > included that tag in the parser, so I will include it. This > > will make > > the other-seqids values available through the notes with the > > term > > Terms.getAdditionalAccessionTerm(), but getIdentifier() will > > remain > > null. > > > > For your second question, the tutorial makes the mistake in > > several > > places of saying getNoteSet(Terms.blahblah()). This was > > shorthand for: > > > > rs.getAnnotation().getProperty(Terms.blahblah()) > > (for single values) > > > > or > > > > ((RichAnnotation)rs.getAnnotation()).getProperties > > ( Terms.blahblah()) > > (for multiple values) > > > > but never got expanded. Maybe someone can fix that one > > day... :)ded... > > > > I'm just updating INSDseq to 1.4 now. The guys next door gave > > me the > > details of the changes, and told me that 1.3 is actually no > > longer > > supported by them after Friday this week! So I'll make it 1.4 > > only. > > > > cheers, > > Richard > > > -- > Richard Holland (BioMart Team) > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > UNITED KINGDOM > Tel: +44-(0)1223-494416 > > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
