Hello all,

I know less about the Genealogical Data Model than some of you do, but I
can see another set of differences.

GEDCOM and any XML representation of it is designed to store the
information in what you might call a "family file", or the kind of
information that is typically captured by any one of a number of
genealogical data storage programs on the market.  GEDCOM's, and
presumably XML representations of them, are in general highly structured
files, with an exception for notes and a few other things where there is
much more in the way of free text.

The GENTECH data model, as I understand it, could perhaps be applied to
what I consider to be unstructured data, or free form text.

Hans is this year's GENTECH Scholar, and this is his scholarship project.
I was one of the two GENTECH Scholars last year, and would have benefitted
from the work Hans is doing had our projects been reversed in order.

I am tagging unstructured documents (narratives, etc.) with some basic
tags such as names, dates and places.  I presented my project at GENTECH
2002, and also at the Family History Workshop at BYU.

It is my belief that the outcome of Hans's project will be an XML
representation (be it DTD or schema or whatever) that could be used to tag
the same unstructured documents that I am now working with.  His project
is creation of the XML representation; my project was the use of Natural
Language Processing techniques to tag unstructured documents with SGML or
XML tags in order to facilitate extraction of genealogical information
from those documents.  Because there was no set of appropriate tags
already in existence, I made up my own.  I think my project would have
been a lot better had Hans's XML representation been available to me at
the time.

So, I look forward to the outcome of this project, in the hope that I can
one day make use of it.

I am also subscribed to the GenealogyXML list that Michael contributes
heavily to, but mostly sit quietly in the background.

-- Mary D. Taffet
   Syracuse University
   Ph.D. Student/School of Information Studies
   Research Analyst/Center for Natural Language Processing
   4-230 Center for Science & Technology
   Syracuse, NY  13244-4100
   E-mail:      [EMAIL PROTECTED]
   Web:         http://web.syr.edu/~mdtaffet/

gdmxml mailing list

Reply via email to