What you're describing is the Campbell's Soup problem, which was part of the AI research and deployment back in the 1980's when Lisp and Business Intelligence systems were in vogue.
However, before we can dive into the "done" part of your request, you need to narrow it down to something more specific. The implied assumption is that databases and information heaps are similar in nature with respect data, arrangement, relationships, etc. They are not. Geopolitically, there are in excess of 190 Countries, Kingdoms, provinces, protectorates, etc. There are over 100+ languages spoken, plus there are great dissimilarities in record keeping in terms of important information. If you want to discuss a more concrete solution to "flailing about looking for diamonds", (research), then lets narrow the discussion to something narrower in scope with a know terminus. From your surname, I would take an educated guess that many of your records start or lie within the US/UK/Ireland/Scotland venue, and branch out to other points within Europe due to intermarriage within lesser and greater royal lines (typical for many people). The Church has already placed a great deal of effort already in this data set, as well as many other organizations; partly due to US immigrant heritage at the time, and partly due to the adoption of English record keeping, laws, and practices. We used to be a collection of English colonies, so that is a natural process. Data mining in and of itself in this environment will yield a plethora of false positives, unless you know more specifically what you are looking for, AND you know your HISTORY in the area and time you are researching. For example, it was common during the middle ages through the Industrial Revolution for women who had lost husbands to marry a relative (sometimes a brother of the deceased). This could be for economic reasons, family reasons, politics, survival or any other reason that made sense to them at the time. On the genealogy charts, you will see the same names and sometimes information show for multiple marriages. This is not a mistake, but people who do not educate themselves and trust in the computer only will see it as a error in the reporting. Data mining does not help here. This is not a computer science problem. It sits in the history and genealogy domain and information management is merely the tool to help us see things more clearly as long as we understand the CONTEXT of the data within those domains as presented. Noodling out an algorithm to apply these kinds of tenuous possible data relationships is noble, but not needed, given we have been blessed with sufficient intelligence to work out the relationships in our head, along with the gift of the Holy Ghost (for those who will bother to use it). Applications like PAF, tools like GEDCOM, and it's derivatives, are valuable in that they help to organize existing data for ANALYSIS. They do not produce the end result. So, Let's talk about a more narrow, concrete, scope to your problem. Steven H. McCown wrote: > Has anyone ever noticed that this list tends to concentrate on hashing and > re-hashing which OSS tools are best? Then, the discussion moves to whether > client-server, webapps, or standalone apps are best. Next, we always jump > on to (my favorite) legal issues. Goto line 1 and repeat... > > I'd like to take a sideline from that and discuss problem solving issues -- > just for a minute. > > I did some research for my family and came to a dead end. At that point, I > sat in several libraries and read book after book. Eventually, place names > and dates started to sound familiar. I started reading genealogies for > unrelated people that lived in the same place/time as my family. Finally, I > found families that had intermarried and surprisingly had clues for my own > family. I've since been able to tie into some very old family lines. > > That will sound very familiar to most researchers as that is the way > genealogy is often done. > > With all that we know about computers, algorithms, searching, data mining, > etc., is there anything that we can do to affect the research process? To > me, as a researcher, whether PAF is AJAX, C++, Python, is mainly a > distraction. The only real requirement is that gen apps be available to > everyone -- whether on the net or not. > > So, the discussion that I'd like to hear is not an Info Tech discussion, but > a hardcore Computer Science one. > > Given the research paradigm that I described above, have you done anything > that might allow researchers to data mine across databases and make > inferences or suggestions to where to look when we get stumped? > > Thanks, > > Steve > > _______________________________________________ > Ldsoss mailing list > [email protected] > http://lists.ldsoss.org/mailman/listinfo/ldsoss > > > _______________________________________________ Ldsoss mailing list [email protected] http://lists.ldsoss.org/mailman/listinfo/ldsoss
