Re: [CODE4LIB] Metadata war stories...
intended for a single institution, or worse, a specific OPAC. Due to the ambiguity in the spec and the desire to just make it look the way I want it to look in my OPAC, the temptation is simply too great. In the end, we have data that couldn't possibly meet the standard as it is described and means that we spend more time than we expected parsing it in the next system. In our case we work through these issues with an army of code tests. Our catalogers and reference staff find broken examples of MARC holdings data parsing in our newest discovery system, we gather the real-world MARC records as a test data set and then we write a bunch of Rspec tests so we don't undo previous bug fixes as we deal with the current ones. The challenge is coming up with a fast and responsive mechanism/process for adding a record to the test set once identified. -Steve Bess Sadler wrote, On 1/27/12 8:26 PM: I remember the required field operation of... aught six? aught seven? It all runs together at my age. Turns out, for years people had been making shell catalog records for items in the collection that needed to be checked out but hadn't yet been barcoded. Some percentage of these people opted not to record any information about the item other than the barcode it left the building under, presumably because they were in a hurry. If there was such a thing as a metadata crime, that'd be it. We were young and naive, we thought why not just index all our catalog records into solr? Little did we know what unholy abominations we would uncover. Out of nowhere, we were surrounded by zombie marc records, horrible half-created things, never meant to roam the earth or even to exist in a sane mind. They could tell us nothing about who they were, what book they had once tried to describe, they could only stare blankly and repeat in mangled agony required field! required field! required field! over and over… It took us weeks to put them all out of their misery. This is the first time I've ever spoken of this publicly. The support group is helping with the nightmares, but sometimes still, I wake in a cold sweat, wondering… did we really find them all? On Jan 27, 2012, at 4:28 PM, Ethan Gruber wrote: EDIT ME http://ead.lib.virginia.edu/**vivaxtf/view?docId=uva-sc/** viu00888.xml;query=;brand=**default#adminlinkhttp://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennantroytenn...@gmail.com wrote: Oh, I should have also mentioned that some of the worst problems occur when people treat their metadata like it will never leave their institution. When that happens you get all kinds of crazy cruft in a record. For example, just off the top of my head: * Embedded HTML markup (one of my favorites is animg tag) * URLs to remote resources that are hard-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennantroytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslielesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_**harvest.htmlhttp://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.**EDUCODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product
Re: [CODE4LIB] Metadata war stories...
] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
Re: [CODE4LIB] Metadata war stories...
-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennantroytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslielesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_**harvest.htmlhttp://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.**EDUCODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
Re: [CODE4LIB] Metadata war stories...
://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennantroytenn...@gmail.com wrote: Oh, I should have also mentioned that some of the worst problems occur when people treat their metadata like it will never leave their institution. When that happens you get all kinds of crazy cruft in a record. For example, just off the top of my head: * Embedded HTML markup (one of my favorites is animg tag) * URLs to remote resources that are hard-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennantroytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslielesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_**harvest.html http://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.**EDU CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] Metadata war stories...
Oh, I should have also mentioned that some of the worst problems occur when people treat their metadata like it will never leave their institution. When that happens you get all kinds of crazy cruft in a record. For example, just off the top of my head: * Embedded HTML markup (one of my favorites is an img tag) * URLs to remote resources that are hard-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennant roytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslie lesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
Re: [CODE4LIB] Metadata war stories...
EDIT ME http://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennant roytenn...@gmail.com wrote: Oh, I should have also mentioned that some of the worst problems occur when people treat their metadata like it will never leave their institution. When that happens you get all kinds of crazy cruft in a record. For example, just off the top of my head: * Embedded HTML markup (one of my favorites is an img tag) * URLs to remote resources that are hard-coded to go through a particular institution's proxy * Notes that only have meaning for that institution * Text that is meant to display to the end-user but may only do so in certain systems; e.g., Click here in a particular subfield. Sigh... Roy On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennant roytenn...@gmail.com wrote: Thanks a lot for the kind shout-out Leslie. I have been pondering what I might propose to discuss at this event, since there is certainly plenty of fodder. Recently we (OCLC Research) did an investigation of 856 fields in WorldCat (some 40 million of them) and that might prove interesting. By the time ALA rolls around there may something else entirely I could talk about. That's one of the wonderful things about having 250 million MARC records sitting out on a 32-node cluster. There are any number of potentially interesting investigations one could do. Roy On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslie lesl...@loc.gov wrote: Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
Re: [CODE4LIB] Metadata war stories...
Roy's fabulous Bitter Harvest paper: http://roytennant.com/bitter_harvest.html -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Walter Lewis Sent: Wednesday, January 25, 2012 1:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Metadata war stories... On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
[CODE4LIB] Metadata war stories...
Hi all, For our preconference, “Digging into Metadata,” we’d like to get a little discussion going to build on once the preconference rolls around. A good part of our discussion will focus on metadata issues and how folks have worked through said issues or have utilized metadata in an unique way while keeping the metadata’s context in mind. Some example include: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) - Dealing with free text in MARC records and how to parse them w/o too much heartache - batch creating and editing metadata Some of you have already touched on this in the last preconference email thread, but we'd like to get some more examples to focus on. What are your metadata war stories? Thanks, Becky - Becky Yoose Systems Librarian Grinnell College
Re: [CODE4LIB] Metadata war stories...
I will contribute one particularly heartbreaking bit from my own current metadata saga - I'm in one of these hybrid museum/research library institutions where the library side has a aging MARC catalog with its own issues that I won't go into at the moment. The museum side has a commercial collection management database that recently changed names from ReDiscovery to Proficio. The good news about this database is that after some digging I uncovered an export method that is fairly free-form and allows me to write a template to export directly to MODS xml which is my intended middle ground between library and museum (the only trick is getting your hands on the Top Sekrit database field names). The bad - actually painful news was discovering how data that had been painstakingly entered by hand over 15 years into separated fields was being munged together as free text within the database. Nobody knew this was happening until I started trying to export data. So, for example, a name and its associated role and dates would have been entered into appropriate separate authority controlled fields in a data-entry form but then would be stuffed into a single field in the database. The only consolation is that they do stuff in some text delimiters that are (mostly) uncommon characters (pipes and underscores) so it is possible to break the fields back out, just very time consuming and prone to introducing errors. Lesson learned: vigorously test how well the data comes out of any system before investing any time putting data into it. Also invest in time travel to go back and apply this lesson at the beginning... -Derek @dmer On Wed, Jan 25, 2012 at 10:06 AM, Becky Yoose b.yo...@gmail.com wrote: Hi all, For our preconference, “Digging into Metadata,” we’d like to get a little discussion going to build on once the preconference rolls around. A good part of our discussion will focus on metadata issues and how folks have worked through said issues or have utilized metadata in an unique way while keeping the metadata’s context in mind. Some example include: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) - Dealing with free text in MARC records and how to parse them w/o too much heartache - batch creating and editing metadata Some of you have already touched on this in the last preconference email thread, but we'd like to get some more examples to focus on. What are your metadata war stories? Thanks, Becky - Becky Yoose Systems Librarian Grinnell College
Re: [CODE4LIB] Metadata war stories...
For our preconference, “Digging into Metadata,” we’d like to get a little discussion going to build on once the preconference rolls around. ... - Dealing with free text in MARC records and how to parse them w/o too much heartache You can find horrendous stories even with data that's fully structured. Multiple libraries have had call numbers not migrated (or the wrong one migrated due to the unfortunate practice of most libraries to retain multiple call numbers) during an ILS migration -- as you can imagine, that would make books much harder to find on the shelves. I can't remember the names of institutions this happened to, but you could probably find someone who can give you precise details on the autocat list. There is the constant problem that in any migration, the data is not structured/used the same way in the new system as in the old -- some fields exist in one system but not the other, different numbers/types of fields are used to represent concepts, etc. I've personally encountered cases where the data that comes out of a system is outright invalid or gets mangled in bizarre ways by the export routine itself. For example, there's a system used for many digital archives that splits a field in two anytime a field that needs to be represented by an XML entity is encountered. Name withheld to protect the guilty. kyle
Re: [CODE4LIB] Metadata war stories...
On 2012-01-25, at 10:06 AM, Becky Yoose wrote: - Dirty data issues when switching discovery layers or using legacy/vendor metadata (ex. HathiTrust) I have a sharp recollection of a slide in a presentation Roy Tennant offered up at Access (at Halifax, maybe), where he offered up a range of dates extracted from an array of OAI harvested records. The good, the bad, the incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. In my years of migrating data, I've seen most of those variants. (except ones *intended* to be BCE). Then there are the fielded data sets without authority control. My favourite example comes from staff who nominally worked for me, so I'm not telling tales out of school. The classic Dynix product had a Newspaper index module that we used before migrating it (PICK migrations; such a joy). One title had twenty variations on Georgetown Independent (I wish I was kidding) and the dates ranged from the early ninth century until nearly the 3rd millenium. (apparently there hasn't been much change in local council over the centuries). I've come to the point where I hand-walk the spatial metadata to links with to geonames.org for the linked open data. Never had to do it for a set with more than 40,000 entries though. The good news is that it isn't hard to establish a valid additional entry when one is required. Walter
Re: [CODE4LIB] Metadata war stories...
I was part of a particularly long siege during the METS offensive back in '08. It was brutal. We pretty much ran out of everything and were fighting hand-to-hand before the whole thing was over. I remember toward the end, while out on requirement gathering patrol, my team came up on a group of rouge library staff who had separated from their cataloging unit. They were just sitting there, literally a few feet away, taking a chow break. We were heavily outnumbered and out-gunned, but it was a dark night, so I hoped we could just lie low and let them pass. But they started talking about how they were plotting a move to take out our dmdSec with some kind of RDF improvised explosive devise. I knew this would set us back months and would result in a great loss of many of my fellow developers and librarians. So, I ordered my team into action…since we had surprise on our side, we were able to even the numbers by taking out several of their squad. Their manager order them to fall back and they retreated up a hill. Several of my team started whooping and hollering like we'd won something, but I knew they were just regrouping to hit back at us. And, boy, did they ever hit back. We had a prolonged shoot out. I knew they longer this went, the more likely they'd be able to call in reinforcements or possibly get us with a Faculty-lead napalm strike. So, I made the quick decision to charge their position. We bounced up the hill, taking cover behind trees, rocks, corpses, and whatever we could. We took heavy fire, but we got to the top. And that's when all hell broke lose. I've killed my fair share of people. In combat, you just learn to live with that. But there's something about strangling someone with your bare hands that just leaves a lasting impression. What happened on that hill comes back to me like nothing else. The screams and the faces and the smell. I talked to that doc and went to some ALA conferences, but whiskey seems to be the only thing that helps. They say we won that war, but most of the time I'm not sure we did….war's not over for me. It's never over. On Jan 25, 2012, at 10:13 AM, Kyle Banerjee wrote: For our preconference, “Digging into Metadata,” we’d like to get a little discussion going to build on once the preconference rolls around. ... - Dealing with free text in MARC records and how to parse them w/o too much heartache You can find horrendous stories even with data that's fully structured. Multiple libraries have had call numbers not migrated (or the wrong one migrated due to the unfortunate practice of most libraries to retain multiple call numbers) during an ILS migration -- as you can imagine, that would make books much harder to find on the shelves. I can't remember the names of institutions this happened to, but you could probably find someone who can give you precise details on the autocat list. There is the constant problem that in any migration, the data is not structured/used the same way in the new system as in the old -- some fields exist in one system but not the other, different numbers/types of fields are used to represent concepts, etc. I've personally encountered cases where the data that comes out of a system is outright invalid or gets mangled in bizarre ways by the export routine itself. For example, there's a system used for many digital archives that splits a field in two anytime a field that needs to be represented by an XML entity is encountered. Name withheld to protect the guilty. kyle