I'm evaluating Linux software suitable to retrieve bibliographical data. A great package is YAZ, a compact toolkit that provides access to the Z39.50 protocol. It comes with the command line tool yaz-client tht allows to access Z39.50 server, e.g. the Library of Congress at z3950.loc.gov:7090/voyager .
Unfortunately, there is a problem with the encoding; http://lcweb.loc.gov/z3950/lcserver.html : The server does not support the complete MARC 21 character set. Diacritics and many special characters are not encoded correctly. Diacritics and special characters are currently being converted to an proprietary Endeavor Latin-1 representation. Obviously, the software and the file format is able to represent all encoding info but the server isn't able to make this info properly available. Thus, tools like http://www.ece.arizona.edu/~denny/python_nest/MARCxUDC.tar.gz will fail. A Google search makes me believe this problem is know since two years (at least!); what can we do about it? Is it possible to map the "proprietary Endeavor Latin-1 representation" to UTF-8? Are there other Z39.50 servers you can use as fallback (London, Paris)? German university libraries do seem to offer direct Z39.50 access, only Web-gateways :-( -- [EMAIL PROTECTED] (work) / [EMAIL PROTECTED] (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
