Quoting "Moore, Richard" <richard.mo...@bl.uk>:
Hal
The initial work of correlating the data from the LC/NAF and the German
authority files and the associated bibliographic records was so effective
that it revealed thousands of errors in the LC/NAF -- duplicates, false
attributions, errors with undifferentiated name records.
I didn't know that. What was done about the errors?
My information is from a presentation by OCLC's Ed O'Neill, at the
ACOC (Australian Committee on Cataloguing) seminar "What's in a Name?"
held in Sydney (N.S.W.) in January 2005.
The formal presentation is available (Powerpoint) on the ACOC website
<www.nla.gov.au/lis/stndrds/grps/acoc/viaf2005.ppt> and of course
relates to the early stages of the project. I've just reviewed that,
but the observations I referred to are not part of it, so they must
have been delivered off the cuff; since my notes seem not to be
findable, I have only recollection to guide me, and cannot be more
precise. I was struck by the figures Ed presented, as they confirmed
impressions I had formed over the previous several years about lurking
errors in the LC/NAF anthe LC catalog, and the OCLC database.
Anyway, my recollection is that Ed told us that these apparent errors
had been reported to (then) CPSO at LC and were to be reviewed and,
where found justifed, corrected. IIRC at this time LC had still not
completely refined the tools they use today for bulk changes of
headings in their bib records to match authority changes (including
reported BFM changes), so the task could have proved very laborious
and may never have been carried through. I guess one might inquire of
the Policy and Standards Division at LC, the chief of which is Dr.
Barbara Tillett, herself a member of the VIAF project team and heavily
involved, of course, in RDA.
VIAF relies for identifying matches between separate authority files
not only on the information in the authority records but (at least in
the initial work, matching DB and LC/NAF names) also on the
bibliographic (resource) records in the DB and LC catalogues
respectively -- Ed O'Neill's presentation gives a fascinating account
of this. I haven't paid enough attention recently to understand how
far this technique has been continued in the expanded VIAF.
At the time I attended Ed O'Neill's presentation, I was more concerned
with ideas of applying similar techniques (I suppose I might call them
data mining?) to help identify and consolidate duplicate bibliographic
records in the ANBD (Australian National Bibliographic Database) which
supports the Libraries Australia service. Therefore perhaps I didn't
pay as much attention as I might have to the authority-resolving
details. But it seems clear to me from what we were given that by
taking broad categories of data (names in headings but also in text
fields (245 $c, 505, 508; publisher names in 260 $b and
corporates/conferences in 11x/71X); titles in 245 $a, 505, 440/490,
7XX/8XX $t, 830), that machine grouping can go a long way towards
record matching, and do a lot to identify bad matches or distinguish
falsely-matched entities, even when working across different data
formats (DB data was not in MARC 21, and BNF data isn't MARC 21). And
therefore I'm left with doubts about whether very fine granularity in
our data, as codified in RDA, is really worth the trouble it seems to
be causing. Fuzzy logic may even do the job better than too-scarce
skilled humans.
Hal Cain, whose involvement is now minimal
Melbourne, Australia
hec...@dml.vic.edu.au
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.