Hello All,

        My name is Tres and I'm an experienced programmer in C/C++ and trying
to learn to program some other languages better: currently Perl and the
Tk extension set for Perl.  I'm also new to relational databases.  I
have usually gotten frustrated with the complexity of them and written
my own into whatever program that needed one.  This has always worked
out for me in the past but I am now looking at creating a data set that
will be used and accessed by many programs.  If I were to write my own I
would have to have access methods for it in many languages and any
tweaks would require a special program to change the database and then a
rebuild of every program that accessed it.  The obvious solution is to
use a relational database.  Seeing as this is going to be a complex
learning experience I'm preparing myself for the task of writing many
programs with the usefulness of "hello world" in an attempt to figure
this all out.

        My question to the group is:  what resources are available on the web
for learning relational database programming using MySQL and Perl or C? 
I'm looking for something heavy on code examples, preferably that can be
cut and pasted into an editor and then tweaked to learn.  I've already
gotten past the problem of creating a table and filling it with data. 
Now what I want to do is search some of the non-key fields for
duplicates.  I'm not asking for code snippets as I like to do my own
work.  What I'm asking for is tutorials and simple programs.  An address
book program would be a good start especially if it came with different
versions that had additional functionality.  Filtering out everybody
except those from Colorado.  Do you even need a key field in a
database?  How would you handle a key field in a database that may have
multiple phone numbers (it is the only field that is guaranteed to be
unique).  Some people may not give you there home phone numbers. 
Multiple employees of the same company might have the same phone number
that differs only by extension.  One of the things that I'm very
interested in is handling duplicates.  Mr. Robert Smith and your friend
Bob may in fact be the same person, to verify that the other fields
should be checked.  If they contain almost the same data then they are
probably the same person and should be combined.

        I'm actually trying to write a program that sorts through images
looking for duplicates based on their MD5's first and then a byte for
byte compare to confirm that they are indeed duplicates.  We have a
digital camera and it seems that we have many file duplicates on the
system.  Although they are uniquely named when they come off of the
camera they tend to get renamed and put in different directories. 
Copies then end up on the web server, they get emailed back and forth,
and all kinds of things happen to create multiple copies of the files. 
A SUID program takes them from the camera and places them in a directory
that is accessible to the entire family but they don't have delete
privileges on the directory.  Many users might make a copy of the image
in their directories and I'm trying to find the duplicates.  I can't use
the filenames because they always seem to be different.  My program
recursively scans a directory and adds the files to a table.  It adds
the file names, the directory, the time and date stamps, and the MD5's
of the files.  It can be run against the various home directories of my
users, the root of the camera's picture directory,  and the www server's
root document directory.  All of that works now.  What I need to do is
figure out how to look through the table of data for matching MD5's and
then compare the files to see if they are truly identical.  Then I need
to replace many of the duplicate files with links.  I can't have links
pointing into a users home directory as file permissions would prevent
others from accessing the images.  It seems logical that everybody have
links in their directory to the images in the camera directory until you
think about the web server which can't follow links outside of the
document tree.  I can figure out all of that logic; what I need to learn
how to do is search for the duplicate files in the database.  Then in
the future I need to check the new files as they are added to see if
they are duplicates.  This actually seems easier than teaching everybody
how to use symbolic links and which direction that they can go.

        This whole program is an exercise in learning MySQL more than it is
removing a few hundred MB from a 120GB hard drive.  Any links, example
programs, and help would be greatly appreciated.

Thanks for your time,
Tres

-- 
Tres Melton <[EMAIL PROTECTED]>


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to