Hello All, My name is Tres and I'm an experienced programmer in C/C++ and trying to learn to program some other languages better: currently Perl and the Tk extension set for Perl. I'm also new to relational databases. I have usually gotten frustrated with the complexity of them and written my own into whatever program that needed one. This has always worked out for me in the past but I am now looking at creating a data set that will be used and accessed by many programs. If I were to write my own I would have to have access methods for it in many languages and any tweaks would require a special program to change the database and then a rebuild of every program that accessed it. The obvious solution is to use a relational database. Seeing as this is going to be a complex learning experience I'm preparing myself for the task of writing many programs with the usefulness of "hello world" in an attempt to figure this all out.
My question to the group is: what resources are available on the web for learning relational database programming using MySQL and Perl or C? I'm looking for something heavy on code examples, preferably that can be cut and pasted into an editor and then tweaked to learn. I've already gotten past the problem of creating a table and filling it with data. Now what I want to do is search some of the non-key fields for duplicates. I'm not asking for code snippets as I like to do my own work. What I'm asking for is tutorials and simple programs. An address book program would be a good start especially if it came with different versions that had additional functionality. Filtering out everybody except those from Colorado. Do you even need a key field in a database? How would you handle a key field in a database that may have multiple phone numbers (it is the only field that is guaranteed to be unique). Some people may not give you there home phone numbers. Multiple employees of the same company might have the same phone number that differs only by extension. One of the things that I'm very interested in is handling duplicates. Mr. Robert Smith and your friend Bob may in fact be the same person, to verify that the other fields should be checked. If they contain almost the same data then they are probably the same person and should be combined. I'm actually trying to write a program that sorts through images looking for duplicates based on their MD5's first and then a byte for byte compare to confirm that they are indeed duplicates. We have a digital camera and it seems that we have many file duplicates on the system. Although they are uniquely named when they come off of the camera they tend to get renamed and put in different directories. Copies then end up on the web server, they get emailed back and forth, and all kinds of things happen to create multiple copies of the files. A SUID program takes them from the camera and places them in a directory that is accessible to the entire family but they don't have delete privileges on the directory. Many users might make a copy of the image in their directories and I'm trying to find the duplicates. I can't use the filenames because they always seem to be different. My program recursively scans a directory and adds the files to a table. It adds the file names, the directory, the time and date stamps, and the MD5's of the files. It can be run against the various home directories of my users, the root of the camera's picture directory, and the www server's root document directory. All of that works now. What I need to do is figure out how to look through the table of data for matching MD5's and then compare the files to see if they are truly identical. Then I need to replace many of the duplicate files with links. I can't have links pointing into a users home directory as file permissions would prevent others from accessing the images. It seems logical that everybody have links in their directory to the images in the camera directory until you think about the web server which can't follow links outside of the document tree. I can figure out all of that logic; what I need to learn how to do is search for the duplicate files in the database. Then in the future I need to check the new files as they are added to see if they are duplicates. This actually seems easier than teaching everybody how to use symbolic links and which direction that they can go. This whole program is an exercise in learning MySQL more than it is removing a few hundred MB from a 120GB hard drive. Any links, example programs, and help would be greatly appreciated. Thanks for your time, Tres -- Tres Melton <[EMAIL PROTECTED]> --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php