Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaGeographicInformationParser" page has been changed by GauthamGowrishankar: https://wiki.apache.org/tika/TikaGeographicInformationParser?action=diff&rev1=1&rev2=2 TikaGeographicInformationParser + ========================================= + Currently Apache Tika lacks the required support to parse .iso19139 files that are crawled from the Acadis websites.There has been a issue that has been created by Prasanth Iyer + (https://issues.apache.org/jira/browse/TIKA-1479) TIKA-1479. I would be continuing from where Prasanth left. The Progress is as below. + + 1. Extract the Meta Data using Apache SIS library (Martin has been a great source of support in this regard). + 2. Customize the Meta Data extracted to construct Meta Data as key multi-value. + 3. The format finalized so far has been key1->[val1,val2..] , key2->[val1,val2...]. + + I would like suggestions on the below progress. + + Default Meta Data extracted from Apache SIS framework is as below + + Default Meta Data + ------------------------ + + Metadata + +-Character set……………………………………………………………………………………………… UTF-8 + +-Contact + ¦ +-Role…………………………………………………………………………………………………………… Resource provider + ¦ +-Party + ¦ +-Name………………………………………………………………………………………………… UCAR/NCAR - CISL - ACADIS + +-Identification info + ¦ +-Citation + ¦ ¦ +-Title……………………………………………………………………………………………… Barrow Atqasuk ARCSS Plant + ¦ ¦ +-Date (1 of 2) + ¦ ¦ ¦ +-Date……………………………………………………………………………………… Dec 16, 2013 12:00:00 AM + ¦ ¦ ¦ +-Date type………………………………………………………………………… Creation + ¦ ¦ +-Date (2 of 2) + ¦ ¦ ¦ +-Date……………………………………………………………………………………… Feb 5, 2015 12:00:00 AM + ¦ ¦ ¦ +-Date type………………………………………………………………………… Modified + ¦ ¦ +-Cited responsible party + ¦ ¦ +-Role……………………………………………………………………………………… Point of contact + ¦ ¦ +-Party + ¦ ¦ +-Name…………………………………………………………………………… Robert Hollister + ¦ ¦ +-Contact info + ¦ ¦ +-Address + ¦ ¦ +-Electronic mail address…… [email protected] + ¦ +-Abstract………………………………………………………………………………………………… These files contain data representing the periodic plant measures of species within each plot in a text tab delimited format. The data presented are seasonal growth of graminoids (length of leaf and length of inflorescence) and seasonal flowering of all species (number of inflorescences in flower within a plot), collected weekly during the summers of 2012-20XX for a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS grid). + ¦ +-Status……………………………………………………………………………………………………… On going + ¦ +-Point of contact + ¦ ¦ +-Role………………………………………………………………………………………………… Point of contact + ¦ ¦ +-Party + ¦ ¦ +-Name……………………………………………………………………………………… Robert Hollister + ¦ ¦ +-Contact info + ¦ ¦ +-Address + ¦ ¦ +-Electronic mail address……………… [email protected] + ¦ +-Resource format + ¦ ¦ +-Format specification citation + ¦ ¦ +-Alternate title………………………………………………………… Other ASCII + ¦ +-Descriptive keywords (1 of 5) + ¦ ¦ +-Keyword………………………………………………………………………………………… EARTH SCIENCE > BIOSPHERE > TERRESTRIAL ECOSYSTEMS > ALPINE/TUNDRA + ¦ ¦ +-Type………………………………………………………………………………………………… Theme + ¦ ¦ +-Thesaurus name + ¦ ¦ +-Title…………………………………………………………………………………… NASA/GCMD Earth Science Keywords + ¦ ¦ +-Alternate title………………………………………………………… Science and Services Keywords + ¦ ¦ +-Date + ¦ ¦ +-Date…………………………………………………………………………… May 21, 2014 12:00:00 AM + ¦ ¦ +-Date type……………………………………………………………… Revision + ¦ +-Descriptive keywords (2 of 5) + ¦ ¦ +-Keyword………………………………………………………………………………………… FIELD SURVEY + ¦ ¦ +-Type………………………………………………………………………………………………… Theme + ¦ ¦ +-Thesaurus name + ¦ ¦ +-Title…………………………………………………………………………………… ACADIS Keywords + ¦ ¦ +-Alternate title………………………………………………………… Platforms + ¦ ¦ +-Date + ¦ ¦ +-Date…………………………………………………………………………… Oct 7, 2014 12:00:00 AM + ¦ ¦ +-Date type……………………………………………………………… Revision + + + Corresponding Customized Meta Data is as below + ----------------------------------------------- + + CharacterSet-->UTF-8 + ContactRole-->RESOURCE_PROVIDER + ContactPartyName-->UCAR/NCAR - CISL - ACADIS + IdentificationInfoCitationTitle-->Barrow Atqasuk ARCSS Plant + CitationDateCREATION-->Mon Dec 16 00:00:00 PST 2013 + CitationDatemodified-->Thu Feb 05 00:00:00 PST 2015 + CitedResponsiblePartyRole-->Role[POINT_OF_CONTACT] + CitedResponsiblePartyName-->Robert Hollister + CitedResponsiblePartyOrganizationName-->null + CitedResponsiblePartyPositionName-->null + CitedResponsiblePartyEMail-->[email protected] + IdentificationInfoAbstract-->These files contain data representing the periodic plant measures of species within each plot in a text tab delimited format. The data presented are seasonal growth of graminoids (length of leaf and length of inflorescence) and seasonal flowering of all species (number of inflorescences in flower within a plot), collected weekly during the summers of 2012-20XX for a subset of 30 grid plots at two sites (Barrow ARCSS grid and Atqasuk ARCSS grid). + IdentificationInfoStatus-->ON_GOING + ResourceFormatSpecificationAlternativeTitle-->Other ASCII + IdentificationInfoLanguage-->English + IdentificationInfoTopicCategory-->BIOTA + DescriptiveKeyWords 1 + ======================= + Keywords-->EARTH SCIENCE > BIOSPHERE > TERRESTRIAL ECOSYSTEMS > ALPINE/TUNDRA + KeywordsType-->THEME + ThesaurusNameTitle-->NASA/GCMD Earth Science Keywords + ThesaurusNameAlternativeTitle-->[Science and Services Keywords] + ThesaurusNameDateREVISION-->Wed May 21 00:00:00 PDT 2014 + DescriptiveKeyWords 2 + ======================= + Keywords-->FIELD SURVEY + KeywordsType-->THEME + ThesaurusNameTitle-->ACADIS Keywords + ThesaurusNameAlternativeTitle-->[Platforms] + ThesaurusNameDateREVISION-->Tue Oct 07 00:00:00 PDT 2014 + + + I definitely feel that the Key Names could be much shorter,your suggestion would be appreciated . + + Once the format would be finalized I can go ahead and start integrating the same into Apache Tika to handle .iso19139 files to make the Tika much more robust. + Feel free to Comment +
