RE: [gdmxml] more thoughts on entering a source

2002-07-12 Thread Beau Sharbrough

Hans,

This has been a fruitful discussion, I think. If I could offer a few
thoughts from a LWG perspective (even though Velke and Anderson know a great
deal more than I do about it)

* The GDM was never mean to be a database design. I know that you've said
that many times but it bears repeating. In this case it's useful to repeat
because you are concerned about redundant storage and the LWG was not
thinking about storage. At the same time, they were thinking about the
relationships between entities and perhaps this one is one that can be
decomposed.

If we oversimplify (because that helps me understand), let's instantiate
some of these classes.

Repository - Library.
Source - Book.

In theory, if I associate a book with a library I am describing their
collection. I could associate a lot of sources with a repository, including
call numbers and their condition, without being involved in a genealogical
search. I'm not certain, but I think that this association might best be
referred to as a CATALOG, which is a well-established model for that
association.

I think that the LWG may have thought that all linking of sources to
repositories would take place as the result of a research activity, hence
the association of activity to this association of sources and repositories.

On reflection, it seems reasonable to have two separate associations - one
of SOURCE to REPOSITORY (called CATALOG?), and another of ACTIVITY to
SOURCE-REPOSITORY (or CATALOG).

I don't think that the LWG ever imagined that the Allen County Public
Library might ever publish an electronic catalog that was compatible with a
GDM compatible client. Hey, it was 1996.

Now it doesn't seem so far-fetched that a GDM compatible client could
contain links to online catalogs - assuming that they aren't being revised
in ways that break the links.

Does that complicate the issue sufficiently?

Beau


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Hans Fugal
Sent: Wednesday, July 10, 2002 11:08 PM
To: [EMAIL PROTECTED]
Subject: Re: [gdmxml] more thoughts on entering a source


I spent a while wrestling this out with my brother Jacob today. There
are situations where one would need to know more than just
repository-id and source-id. For instance, if a particular repository
had more than one copy of source and you wanted to indicate which one
you had searched, repository-id and source-id are not sufficient - you
would also need to know the call-number.  But the call-number itself is
not unique so can't be used as the primary key in repository-source.
Using activity-id as the third key doesn't seem to work though, because
of the extreme redundancy I pointed out. I think repository-source needs
an id field as a primary key, then search can reference that
repository-source-id instead of having repository-id and source-id, and
we take activity-id out of repository-source.

Jacob also helped me see the light on these associative tables (like
repository-source and source-group-source). While I understood their
importance in a database context, I was tempted to collapse them a bit
in xml context. While that's possible to do while still keeping data
integrity, it is better to keep it separate.

As always, I welcome your feedback...
hans/

* Stan Mitchell [Tue,  9 Jul 2002 at 23:12 -0700]
quote
 Yes, it does seem that your suggestion reduces redundancy
 without sacrificing search capability.

 Hans Fugal wrote:

 But then you have to store call-numbers possibly many times. For
 example, a professional researcher would doubtless perform many searches
 in any particular US Census. For that Census the repository, source, call
 number and description would all be the same for every repository-source
 record. The only unique information in each record would be the
 activity-id. Yet if we take out the activity-id from repository-source
 we get rid of that redundancy. AFAICS there is no loss of querying power
 when we do so - search has all three keys, so if you want to know which
 searches you did on a particular call-number, you only have to query the
 search table with the repository-id and source-id.  Or am I still
 missing something?
 
 


 ___
 gdmxml mailing list
 [EMAIL PROTECTED]
 http://fugal.net/cgi-bin/mailman/listinfo/gdmxml
/quote

--
Everybody is talking about the weather but nobody does anything about it.
-- Mark Twain

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml



Re: [gdmxml] more thoughts on entering a source

2002-07-12 Thread Hans Fugal

Goodness, I must be getting bounce-happy. Sorry about that. Good thing I
didn't expose any secret passwords or anything...

Hans :)

* Hans Fugal [Fri, 12 Jul 2002 at 10:56 -0600]
quote
 Hi Beau, I will write more later - I have to get out the door in a
 minute.  Was this intended to be off-list?  May I bounce it to the list?
 
 Hans :)
/quote

-- 
Everybody is talking about the weather but nobody does anything about it.
-- Mark Twain

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml



Re: [gdmxml] more thoughts on entering a source

2002-07-12 Thread Hans Fugal

Actually, it clarifies it in my mind. I think what you described is 
exactly what I was thinking. I like the name catalog as well.

I think that for a database implementation (which the GDM itself is not) 
and also for this xml implementation (although not strictly necessary) I 
will go with that model. In any case, it can be recomposed into the 
original GDM so I'll go that route.

hans/

Beau Sharbrough wrote:
 Hans,
 
 This has been a fruitful discussion, I think. If I could offer a few
 thoughts from a LWG perspective (even though Velke and Anderson know a great
 deal more than I do about it)
 
 * The GDM was never mean to be a database design. I know that you've said
 that many times but it bears repeating. In this case it's useful to repeat
 because you are concerned about redundant storage and the LWG was not
 thinking about storage. At the same time, they were thinking about the
 relationships between entities and perhaps this one is one that can be
 decomposed.
 
 If we oversimplify (because that helps me understand), let's instantiate
 some of these classes.
 
 Repository - Library.
 Source - Book.
 
 In theory, if I associate a book with a library I am describing their
 collection. I could associate a lot of sources with a repository, including
 call numbers and their condition, without being involved in a genealogical
 search. I'm not certain, but I think that this association might best be
 referred to as a CATALOG, which is a well-established model for that
 association.
 
 I think that the LWG may have thought that all linking of sources to
 repositories would take place as the result of a research activity, hence
 the association of activity to this association of sources and repositories.
 
 On reflection, it seems reasonable to have two separate associations - one
 of SOURCE to REPOSITORY (called CATALOG?), and another of ACTIVITY to
 SOURCE-REPOSITORY (or CATALOG).
 
 I don't think that the LWG ever imagined that the Allen County Public
 Library might ever publish an electronic catalog that was compatible with a
 GDM compatible client. Hey, it was 1996.
 
 Now it doesn't seem so far-fetched that a GDM compatible client could
 contain links to online catalogs - assuming that they aren't being revised
 in ways that break the links.
 
 Does that complicate the issue sufficiently?
 
 Beau
 
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
 Hans Fugal
 Sent: Wednesday, July 10, 2002 11:08 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [gdmxml] more thoughts on entering a source
 
 
 I spent a while wrestling this out with my brother Jacob today. There
 are situations where one would need to know more than just
 repository-id and source-id. For instance, if a particular repository
 had more than one copy of source and you wanted to indicate which one
 you had searched, repository-id and source-id are not sufficient - you
 would also need to know the call-number.  But the call-number itself is
 not unique so can't be used as the primary key in repository-source.
 Using activity-id as the third key doesn't seem to work though, because
 of the extreme redundancy I pointed out. I think repository-source needs
 an id field as a primary key, then search can reference that
 repository-source-id instead of having repository-id and source-id, and
 we take activity-id out of repository-source.
 
 Jacob also helped me see the light on these associative tables (like
 repository-source and source-group-source). While I understood their
 importance in a database context, I was tempted to collapse them a bit
 in xml context. While that's possible to do while still keeping data
 integrity, it is better to keep it separate.
 
 As always, I welcome your feedback...
 hans/
 
 * Stan Mitchell [Tue,  9 Jul 2002 at 23:12 -0700]
 quote
 
Yes, it does seem that your suggestion reduces redundancy
without sacrificing search capability.

Hans Fugal wrote:


But then you have to store call-numbers possibly many times. For
example, a professional researcher would doubtless perform many searches
in any particular US Census. For that Census the repository, source, call
number and description would all be the same for every repository-source
record. The only unique information in each record would be the
activity-id. Yet if we take out the activity-id from repository-source
we get rid of that redundancy. AFAICS there is no loss of querying power
when we do so - search has all three keys, so if you want to know which
searches you did on a particular call-number, you only have to query the
search table with the repository-id and source-id.  Or am I still
missing something?




___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml
 
 /quote
 
 --
 Everybody is talking about the weather but nobody does anything about it.
 -- Mark Twain
 
 ___
 gdmxml

Re: [gdmxml] more thoughts on entering a source

2002-07-10 Thread Hans Fugal

I spent a while wrestling this out with my brother Jacob today. There
are situations where one would need to know more than just
repository-id and source-id. For instance, if a particular repository
had more than one copy of source and you wanted to indicate which one
you had searched, repository-id and source-id are not sufficient - you
would also need to know the call-number.  But the call-number itself is
not unique so can't be used as the primary key in repository-source.
Using activity-id as the third key doesn't seem to work though, because
of the extreme redundancy I pointed out. I think repository-source needs
an id field as a primary key, then search can reference that
repository-source-id instead of having repository-id and source-id, and
we take activity-id out of repository-source.

Jacob also helped me see the light on these associative tables (like
repository-source and source-group-source). While I understood their
importance in a database context, I was tempted to collapse them a bit
in xml context. While that's possible to do while still keeping data
integrity, it is better to keep it separate.

As always, I welcome your feedback...
hans/

* Stan Mitchell [Tue,  9 Jul 2002 at 23:12 -0700]
quote
 Yes, it does seem that your suggestion reduces redundancy
 without sacrificing search capability.
 
 Hans Fugal wrote:
 
 But then you have to store call-numbers possibly many times. For
 example, a professional researcher would doubtless perform many searches
 in any particular US Census. For that Census the repository, source, call
 number and description would all be the same for every repository-source
 record. The only unique information in each record would be the
 activity-id. Yet if we take out the activity-id from repository-source
 we get rid of that redundancy. AFAICS there is no loss of querying power
 when we do so - search has all three keys, so if you want to know which
 searches you did on a particular call-number, you only have to query the
 search table with the repository-id and source-id.  Or am I still
 missing something?
  
 
 
 
 ___
 gdmxml mailing list
 [EMAIL PROTECTED]
 http://fugal.net/cgi-bin/mailman/listinfo/gdmxml
/quote

-- 
Everybody is talking about the weather but nobody does anything about it.
-- Mark Twain

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml



Re: [gdmxml] more thoughts on entering a source

2002-07-09 Thread Hans Fugal

hmm, that's an interesting perpsective. This made me look closer at
repository-source, and I am a little muddy now...

It looks like repository-source ties 0 or 1 repositories to 0 or 1
sources to 0 or 1 activities (searches).  It seems to me that this opens
the door for data redundancy - there could be (and I think would be)
many searches in one repository/source combination. But with
repository-source as it is we have to duplicate not only the repository
and source ids, but also the call-number and description.  I don't see
why search has repository-id and source-id and repository-source has
activity-id. Why not take activity-id out of repository-source, leaving
it only to link repositories and sources, and take out the
repository-id and source-id from search? What am I missing - why did the
Lexicon Group do it this way?

hans/

* Stan Mitchell [Tue,  9 Jul 2002 at 10:35 -0700]
quote
 A few thoughts on repository-source ...
 
 IMHO  from an OO point-of-view, repository-source seems to be a
 separate class. It represents the association between no or one instance
 of source and no or one instance of repository, with the constraint that
 there be at least one source or one repository. When a search succeeds,
 then a source and repository are tied together, and information such as
 call-number and description of the condition of the particular source,
 are stored in the repository-source instance.
 
 Another way of looking at it, is as the link between the Administration
 and Evidence submodels, but with perhaps a closer link to Admin.
 Maybe repository-source could be a child of search.
 
 Stan Mitchell
 
 Hans Fugal wrote:
 
 One repository exists in one place, so it seems natural to make
 repository a child element of place. I've also made place-part a child
 of place for the same reason.  
 
 The GDM calls for a sequence number on each place-part of a place, and
 an ordering scheme of the place-parts of a place. With XML order matters
 (unless we say it doesn't) so I see no need for a sequence number; it is
 implied.
 
 On those many-to-many relationships: repository-source isn't as clean
 cut in my mind as source-group-source was, and now I'm not as clear
 about that either.  For one thing, the naming becomes hairy. Naturally
 we don't want to make source a child element of repository, because a
 source could exist in more than one repository; the other way around
 is even more ludicrous.  So, we need to reference the sources in the
 repository or reference the repositories in the sources. So I think 
 perhaps:
 
  source id=film0049002
citation-part citation-part-type=film0049002/citation-part
repository-source idref=fhl/
  /source
 
 That name, repository-source, makes perfect sense in database context,
 but I think it's confusing in this context, where it is a child element
 of the source element. Perhaps repository-ref.  
 
 Maybe we can even allow a repository-source element from either a source
 element or a repository element - that may be harder to deal with in
 implementation though, and there is no way to avoid the possibilitiy of
 duplicates.  So my question for anyone who has an opinion is which is
 better: to put it in one of the elements (i.e. a source element has a
 repository-ref child element), or to have a separate (non-child)
 repository-source element?
 
 hans/
 
  
 
 
 
 
 
 ___
 gdmxml mailing list
 [EMAIL PROTECTED]
 http://fugal.net/cgi-bin/mailman/listinfo/gdmxml
/quote

-- 
Everybody is talking about the weather but nobody does anything about it.
-- Mark Twain

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml



Re: [gdmxml] more thoughts on entering a source

2002-07-09 Thread Hans Fugal

Thanks for the explanations, Stan! Things are beginning to clear up.
See below...

* Stan Mitchell [Tue,  9 Jul 2002 at 15:03 -0700]
 Yes, you're right, it is a three-way association with 0-1 instances of 
 activity.
 From a database perspective, repository-source is an associative table -
 no primary key only foreign keys. So it would be useful for performing
 queries on various combinations of the foreign keys.
 
 I think the three ids (activity, repository, source) serve to identify a
 specific search. In Activity/Search, activity-id is a primary key.
 Each record defines one of three possible kinds of searches:
 1- source without a known repository (search for a repository?)
 2- repository without a particular source in mind (search for sources?)
 3- a source known to exist in a specific repository (normal search)
Ahh, I see why we need the three keys in the search table.

 
 On the other hand, repository-source is indexed by the three ids.
 If a repository has several copies of a source, then a given search
 in one of those copies, would require a separate repository-source
 record to store the call number, etc. If you were interested in which
 copies (call-numbers) of the source you had looked at in a repository,
 a query of just those repository-source records which match
 repository-id and source-id would give that info.
But then you have to store call-numbers possibly many times. For
example, a professional researcher would doubtless perform many searches
in any particular US Census. For that Census the repository, source, call
number and description would all be the same for every repository-source
record. The only unique information in each record would be the
activity-id. Yet if we take out the activity-id from repository-source
we get rid of that redundancy. AFAICS there is no loss of querying power
when we do so - search has all three keys, so if you want to know which
searches you did on a particular call-number, you only have to query the
search table with the repository-id and source-id.  Or am I still
missing something?

 BTW, I'm a software engineer in the San Francisco bay area.
 Genealogy is a side interest of mine. I have studied the gdm
 spec and have an idea to represent it using UML. My background
 is more in C++, OOP, and systems programming. XML is still
 new to me.
I've often thought a UML representation of the GDM would be useful. Let
me know if you come up with one!

hans/
-- 
Everybody is talking about the weather but nobody does anything about it.
-- Mark Twain

___
gdmxml mailing list
[EMAIL PROTECTED]
http://fugal.net/cgi-bin/mailman/listinfo/gdmxml