Quoting "Moore, Richard" <richard.mo...@bl.uk>:

Hal

The initial work of correlating the data from the LC/NAF and the German
authority files and the associated bibliographic records was so effective
that it revealed thousands of errors in the LC/NAF -- duplicates, false
attributions, errors with undifferentiated name records.

I didn't know that. What was done about the errors?

My information is from a presentation by OCLC's Ed O'Neill, at the ACOC (Australian Committee on Cataloguing) seminar "What's in a Name?" held in Sydney (N.S.W.) in January 2005.

The formal presentation is available (Powerpoint) on the ACOC website <www.nla.gov.au/lis/stndrds/grps/acoc/viaf2005.ppt> and of course relates to the early stages of the project. I've just reviewed that, but the observations I referred to are not part of it, so they must have been delivered off the cuff; since my notes seem not to be findable, I have only recollection to guide me, and cannot be more precise. I was struck by the figures Ed presented, as they confirmed impressions I had formed over the previous several years about lurking errors in the LC/NAF anthe LC catalog, and the OCLC database.

Anyway, my recollection is that Ed told us that these apparent errors had been reported to (then) CPSO at LC and were to be reviewed and, where found justifed, corrected. IIRC at this time LC had still not completely refined the tools they use today for bulk changes of headings in their bib records to match authority changes (including reported BFM changes), so the task could have proved very laborious and may never have been carried through. I guess one might inquire of the Policy and Standards Division at LC, the chief of which is Dr. Barbara Tillett, herself a member of the VIAF project team and heavily involved, of course, in RDA.

VIAF relies for identifying matches between separate authority files not only on the information in the authority records but (at least in the initial work, matching DB and LC/NAF names) also on the bibliographic (resource) records in the DB and LC catalogues respectively -- Ed O'Neill's presentation gives a fascinating account of this. I haven't paid enough attention recently to understand how far this technique has been continued in the expanded VIAF.

At the time I attended Ed O'Neill's presentation, I was more concerned with ideas of applying similar techniques (I suppose I might call them data mining?) to help identify and consolidate duplicate bibliographic records in the ANBD (Australian National Bibliographic Database) which supports the Libraries Australia service. Therefore perhaps I didn't pay as much attention as I might have to the authority-resolving details. But it seems clear to me from what we were given that by taking broad categories of data (names in headings but also in text fields (245 $c, 505, 508; publisher names in 260 $b and corporates/conferences in 11x/71X); titles in 245 $a, 505, 440/490, 7XX/8XX $t, 830), that machine grouping can go a long way towards record matching, and do a lot to identify bad matches or distinguish falsely-matched entities, even when working across different data formats (DB data was not in MARC 21, and BNF data isn't MARC 21). And therefore I'm left with doubts about whether very fine granularity in our data, as codified in RDA, is really worth the trouble it seems to be causing. Fuzzy logic may even do the job better than too-scarce skilled humans.

Hal Cain, whose involvement is now minimal
Melbourne, Australia
hec...@dml.vic.edu.au

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Reply via email to