Re: [RDA-L] Completeness of records

hecain Thu, 11 Aug 2011 03:41:48 -0700

Quoting "Moore, Richard" <richard.mo...@bl.uk>:

Hal

The initial work of correlating the data from the LC/NAF and the German
authority files and the associated bibliographic records was so effective
that it revealed thousands of errors in the LC/NAF -- duplicates, false
attributions, errors with undifferentiated name records.


I didn't know that. What was done about the errors?

My information is from a presentation by OCLC's Ed O'Neill, at theACOC (Australian Committee on Cataloguing) seminar "What's in a Name?"held in Sydney (N.S.W.) in January 2005.

The formal presentation is available (Powerpoint) on the ACOC website<www.nla.gov.au/lis/stndrds/grps/acoc/viaf2005.ppt> and of courserelates to the early stages of the project. I've just reviewed that,but the observations I referred to are not part of it, so they musthave been delivered off the cuff; since my notes seem not to befindable, I have only recollection to guide me, and cannot be moreprecise. I was struck by the figures Ed presented, as they confirmedimpressions I had formed over the previous several years about lurkingerrors in the LC/NAF anthe LC catalog, and the OCLC database.

Anyway, my recollection is that Ed told us that these apparent errorshad been reported to (then) CPSO at LC and were to be reviewed and,where found justifed, corrected. IIRC at this time LC had still notcompletely refined the tools they use today for bulk changes ofheadings in their bib records to match authority changes (includingreported BFM changes), so the task could have proved very laboriousand may never have been carried through. I guess one might inquire ofthe Policy and Standards Division at LC, the chief of which is Dr.Barbara Tillett, herself a member of the VIAF project team and heavilyinvolved, of course, in RDA.

VIAF relies for identifying matches between separate authority filesnot only on the information in the authority records but (at least inthe initial work, matching DB and LC/NAF names) also on thebibliographic (resource) records in the DB and LC cataloguesrespectively -- Ed O'Neill's presentation gives a fascinating accountof this. I haven't paid enough attention recently to understand howfar this technique has been continued in the expanded VIAF.

At the time I attended Ed O'Neill's presentation, I was more concernedwith ideas of applying similar techniques (I suppose I might call themdata mining?) to help identify and consolidate duplicate bibliographicrecords in the ANBD (Australian National Bibliographic Database) whichsupports the Libraries Australia service. Therefore perhaps I didn'tpay as much attention as I might have to the authority-resolvingdetails. But it seems clear to me from what we were given that bytaking broad categories of data (names in headings but also in textfields (245 $c, 505, 508; publisher names in 260 $b andcorporates/conferences in 11x/71X); titles in 245 $a, 505, 440/490,7XX/8XX $t, 830), that machine grouping can go a long way towardsrecord matching, and do a lot to identify bad matches or distinguishfalsely-matched entities, even when working across different dataformats (DB data was not in MARC 21, and BNF data isn't MARC 21). Andtherefore I'm left with doubts about whether very fine granularity inour data, as codified in RDA, is really worth the trouble it seems tobe causing. Fuzzy logic may even do the job better than too-scarceskilled humans.


Hal Cain, whose involvement is now minimal
Melbourne, Australia
hec...@dml.vic.edu.au

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Re: [RDA-L] Completeness of records

Reply via email to