What you're describing is the Campbell's Soup problem, which was part of
the AI research and deployment
back in the 1980's when Lisp and Business Intelligence systems were in
vogue.

However, before we can dive into the "done" part of your request, you
need to narrow it down to something
more specific. The implied assumption is that databases and information
heaps are similar in nature with respect
data, arrangement, relationships, etc. They are not. Geopolitically,
there are in excess of 190 Countries, Kingdoms,
provinces, protectorates, etc. There are over 100+ languages spoken,
plus there are great dissimilarities in
record keeping in terms of important information.

If you want to discuss a more concrete solution to "flailing about
looking for diamonds", (research), then lets
narrow the discussion to something narrower in scope with a know
terminus. From your surname, I would
take an educated guess that many of your records start or lie within the
US/UK/Ireland/Scotland venue, and
branch out to other points within Europe due to intermarriage within
lesser and greater royal lines (typical
for many people).

The Church has already placed a great deal of effort already in this
data set, as well as many other organizations;
partly due to US immigrant heritage at the time, and partly due to the
adoption of English record keeping,
laws, and practices. We used to be a collection of English colonies, so
that is a natural process.

Data mining in and of itself in this environment will yield a plethora
of false positives, unless you know more
specifically what you are looking for, AND you know your HISTORY in the
area and time you are researching.
For example, it was common during the middle ages through the Industrial
Revolution for women who had lost
husbands to marry a relative (sometimes a brother of the deceased). This
could be for economic reasons,
family reasons, politics, survival or any other reason that made sense
to them at the time. On the genealogy
charts, you will see the same names and sometimes information show for
multiple marriages. This is not a
mistake, but people who do not educate themselves and trust in the
computer only will see it as a error
in the reporting. Data mining does not help here. This is not a computer
science problem. It sits in the history
and genealogy domain and information management is merely the tool to
help us see things more clearly
as long as we understand the CONTEXT of the data within those domains as
presented. Noodling out an
algorithm to apply these kinds of tenuous possible data relationships is
noble, but not needed, given we
have been blessed with sufficient intelligence to work out the
relationships in our head, along with the
gift of the Holy Ghost (for those who will bother to use it).

Applications like PAF, tools like GEDCOM, and it's derivatives, are
valuable in that they help to organize
existing data for ANALYSIS. They do not produce the end result.

So, Let's talk about a more narrow, concrete, scope to your problem.

Steven H. McCown wrote:
> Has anyone ever noticed that this list tends to concentrate on hashing and
> re-hashing which OSS tools are best?  Then, the discussion moves to whether
> client-server, webapps, or standalone apps are best.  Next, we always jump
> on to (my favorite) legal issues.  Goto line 1 and repeat...
>
> I'd like to take a sideline from that and discuss problem solving issues --
> just for a minute.
>
> I did some research for my family and came to a dead end.  At that point, I
> sat in several libraries and read book after book.  Eventually, place names
> and dates started to sound familiar.  I started reading genealogies for
> unrelated people that lived in the same place/time as my family.  Finally, I
> found families that had intermarried and surprisingly had clues for my own
> family.  I've since been able to tie into some very old family lines.
>
> That will sound very familiar to most researchers as that is the way
> genealogy is often done.  
>
> With all that we know about computers, algorithms, searching, data mining,
> etc., is there anything that we can do to affect the research process?  To
> me, as a researcher, whether PAF is AJAX, C++, Python, is mainly a
> distraction.  The only real requirement is that gen apps be available to
> everyone -- whether on the net or not.
>
> So, the discussion that I'd like to hear is not an Info Tech discussion, but
> a hardcore Computer Science one.  
>
> Given the research paradigm that I described above, have you done anything
> that might allow researchers to data mine across databases and make
> inferences or suggestions to where to look when we get stumped? 
>
> Thanks,
>
> Steve
>
> _______________________________________________
> Ldsoss mailing list
> [email protected]
> http://lists.ldsoss.org/mailman/listinfo/ldsoss
>
>
>   

_______________________________________________
Ldsoss mailing list
[email protected]
http://lists.ldsoss.org/mailman/listinfo/ldsoss

Reply via email to