Re: [librecat-dev] A common MARC record path language

2014-02-25 Thread Thomas Berger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 25.02.2014 12:50, schrieb PHILLIPS M.E.:

 If we are just talking about ISO 2709, the whole family of MARC formats in
 general, then you have to remember that UNIMARC and obsolete formats like 
 UKMARC
 have very different requirements. UKMARC and UNIMARC are actually much easier 
 to
 work with than MARC21 because the ISBD punctuation is not carried in the 
 record
 but is generated from the subfield tags. So you don't have to say give me the
 245 $a and $b but strip / off the end if present because the slash is not
 there.

same thing with MARC21: Punctuation regime for the record is governed by Leader
pos. 18 (descriptive cataloging form which currently gives the choice between
mainly AACR2, ISBD with punctuation and ISBD without punctuation - and
not yet code(s) for RDA).

Here in Germany there is a strong tradition that cataloguers shall not enter
punctuation when the field granularity of the underlying database allows its
automatic generation for display or conversion to other formats
(what I mean is: punctuation is generated when converting from the internal
format to MARC in cases where MARC is not as granular as the internal format).

This applies to RAK data in the union databases and its transport via MAB2 or
MARC21 and it is also the intention to carry this on when switching from RAK
to RDA.

[There's also been the regulation for the D-A-CH application layer to move
punctuation which cannot be eliminated to the start of the subfield it
belongs to, e.g.

245 $a title = $b parallel title

becomes

245 $a title $b = parallel title

probably on the prospect that this could ease processing...]

viele Gruesse
Thomas Berger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlMMjZ0ACgkQYhMlmJ6W47NLLgP+KJcGwEad9zbYoUNRQer/+XBd
L39rvnWDMK6XOmW5NL+M3FQFSfArT2iJ1eyIuni92gLMfURG+z96SrKVQNEcF+IL
DVglbTE4+6OqNGf61YcwBA3x/k+MVrmqGKLqoKE7R43FgaYHKk3s7PlYaf1au9mz
z9nNz/hZDEXmujNIxJ8=
=uVi7
-END PGP SIGNATURE-


Re: [librecat-dev] A common MARC record path language

2014-02-24 Thread Thomas Berger
Carsten,


 Thank you both for bringing the discussion forward. I must admit that I'm
 having some problems following here. I read your mails multiple times, really
 trying to understand your demands. After reading this [1], I hope I'm getting
 closer.

You also could consider to grok Jason Thomale's Interpreting MARC: Where's the
Bibliographic Data?  http://journal.code4lib.org/articles/3832  (preceeding
Karen Coyle's more widely known article MARC21 as Data: A Start
 http://journal.code4lib.org/articles/5468 ).


 I just want to sum up what I think I've understood so far. Please correct me 
 if I'm wrong..
 
 -- When it comes to cataloging based delimiters (punctuation), there is some
 inner semantic to the content of the subfields. E.g. =$b in field 245 means
 something different than :$b.

Yes and no: In your example

787  08 $ireproduction of (manifestation) $aVerdi, Giuseppe, 1813-1901.
$tOtello.$d Milano: Ricordi, c1913

three of the four subfields have internal structure which is likely to
be exploited as in

display $ireproduction of (manifestation) without the text in parentheses
as the left column in a table or styled differently (introductory phrase
in italics and/or followed by a colon)

display $aVerdi, Giuseppe, 1813-1901. as Verdi, Giuseppe, (1813-1901)
succeeded by a colon if $t is next

display the title $tOtello. in italics, index it somewhere

extract the place Milano from $dMilano: Ricordi, c1913 before :

display copyright signs more nicely than c (applies to 787$d).


245$b is only a notorious example where a subfield does not only combine
several concepts as in 787$d but where there is no fixed first one and
therefore its meaning has to be deduced from punctuation information
unfortunately (but as usual) not in the subfield itself but immediately
preceeding it.

Furthermore the ensemble $a+$t+$d constitutes a unit (*one* citation) which
for many cases should not be torn apart.

[There's also the case of 100$c as a kind of unspecific container for
any of the several different classes of information to be injected in
the heading according to AACR2 or RDA: professions, bynames, indications
of rank etc. But there is no non-MARC markup except , and it's almost
impossible to revere engineer $c to the factual information (spanish
king) underlying the heading]


 -- There may be data you want to get at whole, which spread over multiple
 subfields. This information is cannot be described by the range of subfields,
 but with the closure through punctuation. E.g. in the field
 
 245   00$aHeritage Books archives.$pUnderwood biographical 
 dictionary.$nVolumes 1  2 revised$h[electronic resource] /$cLaverne 
 Galeener-Moore.
 
 the data you want to get is
 
 Heritage Books archives. Underwood biographical dictionary. Volumes 1  2 
 revised [electronic resource]

I think 245 is one of the many cases where specific information can be
/deduced/ from (MARC and ISBD) markup in the field but it would be
dangerous to state that e.g. 245$h /contains data/. It is tempting to
speak or think in terms of subfield content, i.e. something data-like
which is implicitly terminated by the next subfield mark: The  /
actually does not belong to $h when attempting to view it as data, it's
just an indication that the next subfield mark to follow will probably $c).
Thus 245 is in XML lingo mixed content with most of the prescribed
punctuation /outside/ the children data elements. As usual, also MARCspec
cannot boldly declare that the permissible results should be regarded
as the text or the data - both views are legitimate and have to be
taken into account.

To achieve the string you just gave is either trivial (prevalent AACR2 practice
with ISBD punctuation always provided in the record: Fetch the field and
substitute $.? by a single space) or involves much magic (coming D-A-CH
practice with ISBD punctuation generally not provided: Fetch the field,
analyze the subfield marks and enhance it with proper ISBD punctuation.
[o.k. I see: You either stripped $c from it or the content after / or
the specific constellation of the trailing / immediately preceeding
$anything or specifically $c - ISBD knows about a parallel statement
of responsibility like in Our Mission / by Corporate Body A = Notre
Commande / par Corporation A but I don't know offhand how this is coded
in AACR2+MARC for current examples]

And - as I'm not a typewriter - I rather would like to process the content of
245 with the help of the semantic clues given by the MARC encoding. Something
with only .  as remaining delimiters is not much help. (And retrieving
more refined components like $a, $b etc. afterwards and match them to
specific parts of the combined string above seems to be very much work -
comparable to automatic tagging of OCR results)


 Is this what you mean when want to say something like Get me all from field 
 XXX until you hit Y? I guess so.

As I understand the purpose of MARCspec it is kind of hit and run:

It is not a MOM (MARC Object 

Re: MARC::XML failing on CPAN testers

2012-01-17 Thread Thomas Berger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Jakob,

 MARC::XML is heavily failing on CPAN Tester:
 http://www.cpantesters.org/distro/M/MARC-XML.html
 
 My PICA::Record module is failing for the same reason (XML Namespace support 
 for
 XML::SAX). I don't know how to fix this, so I asked here:
 
 http://www.perlmonks.org/?node_id=948311
 
 Maybe someone here knows better?

Sure. I had similar issues in the past when starting to request
a namespace-aware parser from the factory (in order to do it right
i.e. I wanted to avoid failing tests because of hypothetical
inappropriate parsers).

IIRC the situation is as follows:

- - In absence of ParserDetails.ini XML::SAX falls back to providing
  XML::SAX::PurePerl *only* when the parser has been requested
  without specific properties.

- - ParserDetails.ini is *not* set up on automated / unattended installs
  of XML::SAX (like when a smoke tester's environment pulls it in
  as dependency of your module)

Unfortunately I have no clue about this might be fixed (apart from changing
the installation behavior of XML::SAX which might for a reason be
as it is)

HTH
Thomas Berger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iJwEAQECAAYFAk8VdwwACgkQYhMlmJ6W47NI6AP8CsnBFuJRrA8YwLmgfemH8btK
rKJl3bKeTZmCr8JEcEWP4qNgZUSsNN7k245Rez7HtvI5wf2Ki0Z/s5pMcn43Kuel
m5Nu5ch2AsbcXG9UyErqObGKeu10SYiW3Qy9ryJ/dGql2RmbgEY6M3O86Vms/GT8
MaBWgClWgWXjnuGxsUE=
=Gxh7
-END PGP SIGNATURE-


Re: OAI::Harvester installation help

2011-05-18 Thread Thomas Berger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

thanks to further input from the original reporter I (as co-maintainer of
N:O:H) have been able to sort out the issues illustrated by the report:

- - one of the repositories used in the test suite changed its address
  recently thus causing some tests to fail: fixed

- - The LibXML family of parsers behaves very noisily when it comes to
  the test for illegal XML making it hard to notice that the test
  actually succeeds (not fixed)

Two issues with XML::SAX might be of broader interest:

[not applicable to this thread:
- - ParserDetails.ini
  XML::SAX (::ParserFactory) uses a text file ParserDetails.ini located
  in the folder SAX.pm resides in or (Debian only?) under /etc/perl/...
  This file contains the list of known parsers in this installation and
  their properties. For myself I have noticed several times that this
  file was not generated (because of non-interactive dependency installs?)
  and subsequently installed individual parsers were not registered.
  My impression is that the XML::SAX framework should fall back to
  XML::SAX::PurePerl installed by the package itself but this does
  not seem to happen in the Net::OAI::Harvester test suite (maybe
  because the parser is requested with a required feature).
  Status: not tackled yet
  cf.  http://perl-xml.sourceforge.net/faq/#parserdetails.ini 
]

- - latest version of XML::SAX::Base
  some sub-modules of Net::OAI::Harvester use the get_handler() method
  supplied by XML::SAX::Base as of version 1.04. This module is
  literally hidden in the XML::SAX distribution (source is generated by
  Makefile.PL to prevent indexing of the module on CPAN). There is a
  strain of standalone versions of XML::SAX::Base on CPAN ending at
  version 1.02 which does not contain the method in question. (The README
  of this standalone module gives the warning that you probably do not want
  to install this module but the complete XML::SAX framework).
  When you explicitly request installation of XML::SAX::Base there is
  a probability that this fetches version 1.02 and takes precedence over or
  actually overwrites version 1.04 installed by XML::SAX and there is
  absolutely no upgrade path: XML::SAX::Base 1.02 must be uninstalled/removed
  then for things to work again.
  For Net::OAI::Harvester I have refined the requirement of XML::SAX::Base
  to the specific version 1.04 and I'm awaiting the CPAN Tester reports
  to come in: It might well be that more systems than before are entrapped
  to install the wrong module when performing Build installdeps and thus
  effectively cut themselves off from executing the tests at all.


Thomas Berger



Am 18.05.2011 09:14, schrieb Saiful Amin:

 I am attempting to use some code which depends on Net::OAI::Harvester,
 but my attempts to install OAI::Harvester are running into problems
 with:

 Any suggestions for getting this installed properly?  I'm assuming that
 this is a case where a simple force install isn't going to get me a
 working installation...

 
 I've used Net::OAI::Harvester on both Ubuntu and Windows XP for my projects.
 On XP I've used Strawberry Perl, in which installing using CPAN luckily
 worked without any problem. In Ubuntu, I had to install from synaptic when
 CPAN failed. If I remember correctly, the command was:
 # sudo apt-get install libnet-oai-harvester-perl
 
 It works better than cpan in managing dependencies in my experience.
 
 Regards,
 Saiful Amin
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iJwEAQECAAYFAk3ThigACgkQYhMlmJ6W47PypwP9HwJSNbwmtOh+3G+Y4wKFJODS
r1UiDOGc/TDi5zcgRtEHq8lDTlH/CecHYnJv5IN5rJiW2icykoI1Th5lEKX5K90N
s+I8xSpEZXfL5k51hTu7Nql5F8iyF/L7lSyMic3s91/kdAraoDgagcf6pEYg4dRt
Lv2MSAS1EPuU6jdU4JQ=
=QUFO
-END PGP SIGNATURE-