Re: [librecat-dev] A common MARC record path language
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 25.02.2014 12:50, schrieb PHILLIPS M.E.: If we are just talking about ISO 2709, the whole family of MARC formats in general, then you have to remember that UNIMARC and obsolete formats like UKMARC have very different requirements. UKMARC and UNIMARC are actually much easier to work with than MARC21 because the ISBD punctuation is not carried in the record but is generated from the subfield tags. So you don't have to say give me the 245 $a and $b but strip / off the end if present because the slash is not there. same thing with MARC21: Punctuation regime for the record is governed by Leader pos. 18 (descriptive cataloging form which currently gives the choice between mainly AACR2, ISBD with punctuation and ISBD without punctuation - and not yet code(s) for RDA). Here in Germany there is a strong tradition that cataloguers shall not enter punctuation when the field granularity of the underlying database allows its automatic generation for display or conversion to other formats (what I mean is: punctuation is generated when converting from the internal format to MARC in cases where MARC is not as granular as the internal format). This applies to RAK data in the union databases and its transport via MAB2 or MARC21 and it is also the intention to carry this on when switching from RAK to RDA. [There's also been the regulation for the D-A-CH application layer to move punctuation which cannot be eliminated to the start of the subfield it belongs to, e.g. 245 $a title = $b parallel title becomes 245 $a title $b = parallel title probably on the prospect that this could ease processing...] viele Gruesse Thomas Berger -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iJwEAQECAAYFAlMMjZ0ACgkQYhMlmJ6W47NLLgP+KJcGwEad9zbYoUNRQer/+XBd L39rvnWDMK6XOmW5NL+M3FQFSfArT2iJ1eyIuni92gLMfURG+z96SrKVQNEcF+IL DVglbTE4+6OqNGf61YcwBA3x/k+MVrmqGKLqoKE7R43FgaYHKk3s7PlYaf1au9mz z9nNz/hZDEXmujNIxJ8= =uVi7 -END PGP SIGNATURE-
Re: [librecat-dev] A common MARC record path language
Carsten, Thank you both for bringing the discussion forward. I must admit that I'm having some problems following here. I read your mails multiple times, really trying to understand your demands. After reading this [1], I hope I'm getting closer. You also could consider to grok Jason Thomale's Interpreting MARC: Where's the Bibliographic Data? http://journal.code4lib.org/articles/3832 (preceeding Karen Coyle's more widely known article MARC21 as Data: A Start http://journal.code4lib.org/articles/5468 ). I just want to sum up what I think I've understood so far. Please correct me if I'm wrong.. -- When it comes to cataloging based delimiters (punctuation), there is some inner semantic to the content of the subfields. E.g. =$b in field 245 means something different than :$b. Yes and no: In your example 787 08 $ireproduction of (manifestation) $aVerdi, Giuseppe, 1813-1901. $tOtello.$d Milano: Ricordi, c1913 three of the four subfields have internal structure which is likely to be exploited as in display $ireproduction of (manifestation) without the text in parentheses as the left column in a table or styled differently (introductory phrase in italics and/or followed by a colon) display $aVerdi, Giuseppe, 1813-1901. as Verdi, Giuseppe, (1813-1901) succeeded by a colon if $t is next display the title $tOtello. in italics, index it somewhere extract the place Milano from $dMilano: Ricordi, c1913 before : display copyright signs more nicely than c (applies to 787$d). 245$b is only a notorious example where a subfield does not only combine several concepts as in 787$d but where there is no fixed first one and therefore its meaning has to be deduced from punctuation information unfortunately (but as usual) not in the subfield itself but immediately preceeding it. Furthermore the ensemble $a+$t+$d constitutes a unit (*one* citation) which for many cases should not be torn apart. [There's also the case of 100$c as a kind of unspecific container for any of the several different classes of information to be injected in the heading according to AACR2 or RDA: professions, bynames, indications of rank etc. But there is no non-MARC markup except , and it's almost impossible to revere engineer $c to the factual information (spanish king) underlying the heading] -- There may be data you want to get at whole, which spread over multiple subfields. This information is cannot be described by the range of subfields, but with the closure through punctuation. E.g. in the field 245 00$aHeritage Books archives.$pUnderwood biographical dictionary.$nVolumes 1 2 revised$h[electronic resource] /$cLaverne Galeener-Moore. the data you want to get is Heritage Books archives. Underwood biographical dictionary. Volumes 1 2 revised [electronic resource] I think 245 is one of the many cases where specific information can be /deduced/ from (MARC and ISBD) markup in the field but it would be dangerous to state that e.g. 245$h /contains data/. It is tempting to speak or think in terms of subfield content, i.e. something data-like which is implicitly terminated by the next subfield mark: The / actually does not belong to $h when attempting to view it as data, it's just an indication that the next subfield mark to follow will probably $c). Thus 245 is in XML lingo mixed content with most of the prescribed punctuation /outside/ the children data elements. As usual, also MARCspec cannot boldly declare that the permissible results should be regarded as the text or the data - both views are legitimate and have to be taken into account. To achieve the string you just gave is either trivial (prevalent AACR2 practice with ISBD punctuation always provided in the record: Fetch the field and substitute $.? by a single space) or involves much magic (coming D-A-CH practice with ISBD punctuation generally not provided: Fetch the field, analyze the subfield marks and enhance it with proper ISBD punctuation. [o.k. I see: You either stripped $c from it or the content after / or the specific constellation of the trailing / immediately preceeding $anything or specifically $c - ISBD knows about a parallel statement of responsibility like in Our Mission / by Corporate Body A = Notre Commande / par Corporation A but I don't know offhand how this is coded in AACR2+MARC for current examples] And - as I'm not a typewriter - I rather would like to process the content of 245 with the help of the semantic clues given by the MARC encoding. Something with only . as remaining delimiters is not much help. (And retrieving more refined components like $a, $b etc. afterwards and match them to specific parts of the combined string above seems to be very much work - comparable to automatic tagging of OCR results) Is this what you mean when want to say something like Get me all from field XXX until you hit Y? I guess so. As I understand the purpose of MARCspec it is kind of hit and run: It is not a MOM (MARC Object
Re: MARC::XML failing on CPAN testers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Jakob, MARC::XML is heavily failing on CPAN Tester: http://www.cpantesters.org/distro/M/MARC-XML.html My PICA::Record module is failing for the same reason (XML Namespace support for XML::SAX). I don't know how to fix this, so I asked here: http://www.perlmonks.org/?node_id=948311 Maybe someone here knows better? Sure. I had similar issues in the past when starting to request a namespace-aware parser from the factory (in order to do it right i.e. I wanted to avoid failing tests because of hypothetical inappropriate parsers). IIRC the situation is as follows: - - In absence of ParserDetails.ini XML::SAX falls back to providing XML::SAX::PurePerl *only* when the parser has been requested without specific properties. - - ParserDetails.ini is *not* set up on automated / unattended installs of XML::SAX (like when a smoke tester's environment pulls it in as dependency of your module) Unfortunately I have no clue about this might be fixed (apart from changing the installation behavior of XML::SAX which might for a reason be as it is) HTH Thomas Berger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iJwEAQECAAYFAk8VdwwACgkQYhMlmJ6W47NI6AP8CsnBFuJRrA8YwLmgfemH8btK rKJl3bKeTZmCr8JEcEWP4qNgZUSsNN7k245Rez7HtvI5wf2Ki0Z/s5pMcn43Kuel m5Nu5ch2AsbcXG9UyErqObGKeu10SYiW3Qy9ryJ/dGql2RmbgEY6M3O86Vms/GT8 MaBWgClWgWXjnuGxsUE= =Gxh7 -END PGP SIGNATURE-
Re: OAI::Harvester installation help
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, thanks to further input from the original reporter I (as co-maintainer of N:O:H) have been able to sort out the issues illustrated by the report: - - one of the repositories used in the test suite changed its address recently thus causing some tests to fail: fixed - - The LibXML family of parsers behaves very noisily when it comes to the test for illegal XML making it hard to notice that the test actually succeeds (not fixed) Two issues with XML::SAX might be of broader interest: [not applicable to this thread: - - ParserDetails.ini XML::SAX (::ParserFactory) uses a text file ParserDetails.ini located in the folder SAX.pm resides in or (Debian only?) under /etc/perl/... This file contains the list of known parsers in this installation and their properties. For myself I have noticed several times that this file was not generated (because of non-interactive dependency installs?) and subsequently installed individual parsers were not registered. My impression is that the XML::SAX framework should fall back to XML::SAX::PurePerl installed by the package itself but this does not seem to happen in the Net::OAI::Harvester test suite (maybe because the parser is requested with a required feature). Status: not tackled yet cf. http://perl-xml.sourceforge.net/faq/#parserdetails.ini ] - - latest version of XML::SAX::Base some sub-modules of Net::OAI::Harvester use the get_handler() method supplied by XML::SAX::Base as of version 1.04. This module is literally hidden in the XML::SAX distribution (source is generated by Makefile.PL to prevent indexing of the module on CPAN). There is a strain of standalone versions of XML::SAX::Base on CPAN ending at version 1.02 which does not contain the method in question. (The README of this standalone module gives the warning that you probably do not want to install this module but the complete XML::SAX framework). When you explicitly request installation of XML::SAX::Base there is a probability that this fetches version 1.02 and takes precedence over or actually overwrites version 1.04 installed by XML::SAX and there is absolutely no upgrade path: XML::SAX::Base 1.02 must be uninstalled/removed then for things to work again. For Net::OAI::Harvester I have refined the requirement of XML::SAX::Base to the specific version 1.04 and I'm awaiting the CPAN Tester reports to come in: It might well be that more systems than before are entrapped to install the wrong module when performing Build installdeps and thus effectively cut themselves off from executing the tests at all. Thomas Berger Am 18.05.2011 09:14, schrieb Saiful Amin: I am attempting to use some code which depends on Net::OAI::Harvester, but my attempts to install OAI::Harvester are running into problems with: Any suggestions for getting this installed properly? I'm assuming that this is a case where a simple force install isn't going to get me a working installation... I've used Net::OAI::Harvester on both Ubuntu and Windows XP for my projects. On XP I've used Strawberry Perl, in which installing using CPAN luckily worked without any problem. In Ubuntu, I had to install from synaptic when CPAN failed. If I remember correctly, the command was: # sudo apt-get install libnet-oai-harvester-perl It works better than cpan in managing dependencies in my experience. Regards, Saiful Amin -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iJwEAQECAAYFAk3ThigACgkQYhMlmJ6W47PypwP9HwJSNbwmtOh+3G+Y4wKFJODS r1UiDOGc/TDi5zcgRtEHq8lDTlH/CecHYnJv5IN5rJiW2icykoI1Th5lEKX5K90N s+I8xSpEZXfL5k51hTu7Nql5F8iyF/L7lSyMic3s91/kdAraoDgagcf6pEYg4dRt Lv2MSAS1EPuU6jdU4JQ= =QUFO -END PGP SIGNATURE-