Re: Getting started with Z39.50

2017-04-06 Thread Patrick Hochstenbach
Hi there is also Z39.50 support in Catmandu

https://metacpan.org/pod/Catmandu::Importer::Z3950

e.g. To fetch marc you can do:

$ catmandu convert Z3950 --host z3950.loc.gov --port 7090 --databaseName 
Voyager --query "(title = dinosaur)" to MARC

a Perl api is included.

Patrick

From: Eric Lease Morgan 
Sent: Thursday, April 6, 2017 11:11:11 PM
To: Perl4lib
Subject: Re: Getting started with Z39.50

On Apr 6, 2017, at 4:44 PM, charles hobbs via perl4lib  
wrote:

> Hello allI haven't posted here for a long time, but have been doing lots 
> of interesting stuff with MARC/Perl
>
> I would like to know an easy way to get started with Z39.50. (For example, 
> how to get MARC records from the LC, NLM, etc. servers)
> Anyone have some program segments they would be willing to share?
>
> Thanks for your time and help.


Here is a link to a simple (read, “rudimentary”) Z39.50 search interface:

  http://sites.nd.edu/emorgan/2013/11/fun/

But I’m sure your mileage will vary.

If you are comfortable with Perl, then I suggest you play with Yaz/Zoom:

  http://search.cpan.org/~mirk/Net-Z3950-ZOOM/

—
Eric Morgan


Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Patrick Hochstenbach
t; >
> > >count of occurrences will be printed instead.
> > >
> > > Examples: -f '100,245' will print field 100 and 245
> > >
> > >   -f '400,#400' will print all occurrences of 400
> > >
> > > field as well as the number of its occurrences
> > >
> > >-o  Output format: "marc" for ISO2709, "line" for each subfield
> > >
> > > in
> > >
> > >a line, "inline" (default) for each field in a line.
> > >
> > >-s  Specify a string separator for condition. Default is ','.
> > >
> > >-v  Invert the sense of matching, to select non-matching
> > >
> > > records.
> > >
> > >-V  Print the version and exit.
> > >
> > >file.mrc
> > >
> > >The mandatory ISO2709 file to read. Can be STDIN, '-'.
> > >
> > > DESCRIPTION
> > >
> > >Like grep, the famous Unix utility, MARCgrep.pl allows to filter
> > >
> > > MARC
> > >
> > >bibliographic
> > >
> > > records based on conditions on tag, indicators, and field value.
> > >
> > >Conditions can be applied to data fields, control fields or the
> > >
> > > leader.
> > >
> > >In case of data fields, the condition can specify tag, indicators,
> > >subfield and value using regular
> > >
> > > expressions. In case of control fields, the condition must contain
> > >
> > > the
> > >
> > >tag name, the starting
> > >
> > > and ending position (both 0-based), and a regular expressions for
> > >
> > > the
> > >
> > >value.
> > >
> > >Options -c and -v allow respectively to count matching records and
> > >
> > > to
> > >
> > >    invert the match.
> > >
> > >If option -c is not specified, the output format can be "line" or
> > >"inline" (both human readable),
> > >
> > > or "marc" for MARC binary (ISO2709). For formats "line" or
> > >
> > > "inline",
> > >
> > >the -f option allows to specify
> > >
> > > fields to print.
> > >
> > >You can chain more conditions using
> > >
> > >./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl -e
> > >condition2 -
> > >
> > > KNOWN ISSUES
> > >
> > >Performance.
> > >
> > >Accepts and returns only UTF-8.
> > >
> > >Checks are case sensitive.
> > >
> > > AUTHOR
> > >
> > >Pontificia Universita' della Santa Croce <http://www.pusc.it/bib/>
> > >
> > >Stefano Bargioni <bargi...@pusc.it>
> > >
> > > SEE ALSO
> > >
> > >marktriggs / marcgrep at <https://github.com/marktriggs/marcgrep>
> > >
> > > for
> > >
> > >filtering large data sets
> > > >
> > > > On 02 nov 2016, at 09:57, Sergio Letuche <code4libus...@gmail.com>
> > >
> > > wrote:
> > > > Hello community,
> > > >
> > > > how would you treat the following?
> > > >
> > > > I need a way to identify all tags - subfields, that have stored an ISSN
> > >
> > > number in them.
> > >
> > > > What would you suggest as a clever approach for this?
> > > >
> > > > Thank you
> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 

Patrick Hochstenbach - digital architect
University Library Ghent
Sint-Hubertusstraat 8 - 9000 Ghent - Belgium
patrick.hochstenb...@ugent.be
+32 (0)9 264 7980



RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-09 Thread Patrick Hochstenbach
Hi Robin

Sure

 join_field(subject.*, ); 
 join_field(subject,br);

The first join is for concatenating all the subfields. The second join is for 
all the field.

In the new Catmandu version we are enhancing the language a bit, thats why I 
might have written my previous examples with the new syntax.

Greetings from ELAG2014 in Bath!

Patrick

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Monday, June 09, 2014 4:58 AM
To: perl4lib
Subject: Re: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]:
 $ cat test.fixes
 marc_map('650','subject',join:'###');
 remove_field('record');

Ah, I found that I need to change the syntax a bit:

marc_map('650','subject', -split = 1);

gives me:

{subject:[[Counting,Pictorial works,Juvenile
literature.],[English language,Alphabet,Pictorial works,Juvenile
literature.,14467],[Time,Pictorial works,Juvenile
literature.,15531],[Children's stories, English,Pictorial
works.]],_id:5567128}

which is closer. Is there an easy way to flatten those arrays?

Otherwise I can go with join and the split, but this way seems cleaner.

Actually, I wonder if nested arrays would work even better for my
purposes, I guess I should test it...

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF

RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-05 Thread Patrick Hochstenbach
Hi Robin

By default all repeated subfields get joined by empty space, you can set this 
with the 'join' option:

marc_map('650v','subject',join:'%%%')

gives you:

subject,Pictorial works%%%Juvenile

Or, if you have many 650 fields they are all joined into one string:

subject,Pictorial works%%%Juvenile%%%foo%%%bar%%%test

With the split_field command you can turn this again into an array:

split_field('subject','%%%')

gives you

subject,[Pictorial works,Juvenile,foo,bar,test]

Cheers
Patrick

PS. Indeed, the marc_map.pl is a bit cryptic. We are compiling perl scripts to 
make the executing much faster. The developers are now figuring out how to 
refactor this compilation out so that the Fix packages are easier to read.

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

[650, ,0,a,Time,v,Pictorial works,v,Juvenile
literature.,9,15531]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

subject:[Time,Pictorial worksJuvenile literature.]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF

RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-05 Thread Patrick Hochstenbach
Btw I've updates the Fixes cheat sheet at our Wiki to reflect your question :)

https://github.com/LibreCat/Catmandu/wiki/Fixes-Cheat-Sheet

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

[650, ,0,a,Time,v,Pictorial works,v,Juvenile
literature.,9,15531]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

subject:[Time,Pictorial worksJuvenile literature.]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF

RE: [librecat-dev] A common MARC record path language

2014-02-19 Thread Patrick Hochstenbach
Hi Carsten

Thanks for the new spec I think it is a great initiative to align many projects 
that are processing MARC records. Here are some general remarks I hope we can 
use to discuss the spec more in depth.

What I'm missing reading the specification is a separate use-case document. In 
the spec I see sections like the introduction of 2 Expressing MARCspecs as 
string and 2.1 which are design concerns which require a separate discussion 
from the formal part of the document. I mean, I can agree or disagree with the 
design concerns..with the formal section I should be able to say if it is 
correct or not.

The discussion we have here in this email thread deserves a separate document 
of use-cases. Producing Linked Data is only of the cases. Solrmarc is about  
transforming MARC into something that can be send to SOLR. In ILS systems you 
might use it to point to parts of MARC you want to display in a webinterface. 
In catmandu you might want to produce reports. Every use-case can have its own 
needs to make parts of MARC easy addressable.

We need tools like easyM2R, solrmarc, catmandu not only because of the 
verboseness of XPath or because it is tight to one possible serialization of 
MARC. Of course I love to write

100$a instead of /marc:record//marc:datafield[@tag='100']

This opens up a new class of easy DSL tools to process our datasets. 

But..this treats MARC as a document key-value exchange format for 
bibliographical data. And I can't agree with that... or not in a strict sense. 
I can as easily state that MARC is a mark-up language that requires more 
processing after the first mappings have been made. E.g. if you want to map 
260$c to an xsd^date field you really need get rid of the trailing dot '.' at 
the end. MARC is a key-value exchange format only as first approximation.

Using cataloging rules you can get much more information out of the record. And 
I wonder if in a second approximation we could add paths that implement some of 
that logic.

For instance. as stupid example:

245{/$.} : could evaluate to everything in 245 until you hit the first 
/$$subfield

In catmandu..we'll we don't have a spec for that. We do the same things as in  
easyM2R and solrmarc and create a small DSL language of functions that get 
MARCspecs as input. Of course we could all agree on a same collection of 
functions like move_field, split_field, copy_field etc etc. But I hope there 
are other options also.

Cheers
Patrick


From: Klee, Carsten [carsten.k...@sbb.spk-berlin.de]
Sent: Wednesday, February 19, 2014 2:27 PM
To: 't...@gymel.com'; Patrick Hochstenbach
Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org
Subject: AW: [librecat-dev] A common MARC record path language

Hi Thomas and Patrick!

I think the whole problem lies in the limited expressivity of strings. MARCspec 
is pretty much close to XPath at its approach, but without regular expressions 
and functions like first(), last() etc. But even with XPath it would be pretty 
hard to get the character before a subfield in a MARCXML file.

The only solution I can think of, is using regular expressions. And I'm not 
convinced that bringing this into MARCspec is a good idea. As I already 
mentioned in the spec, MARCspec is not independent from the application using 
MARCspec. Taking regular expressions into MARCspec wouldn't make the 
application more usable, but would blow up the specification.

One example:

The data in field 245 is:

$aConcerto per piano n. 21, K 467$h[sound recording] /$cW.A. Mozart

The desired result is (rule: take everything from 245 until the string ' /$' 
appears):

Concerto per piano n. 21, K 467 [sound recording]

Imagine a MARCspec with regular expression. // pseudo code coming up!

marcspec = 245.match(/(.*)\s\/\$/)
titleData = getMARCspec(record, marcspec)
print titleData[1]
// should result in $aConcerto per piano n. 21, K 467$h[sound recording]

Now pretty the same but without the regular expression in the MARCspec.

marcspec = 245
titleData = getMARCspec(record, marcspec).match(/(.*)\s\/\$/)
print titleData[1]
// should result in $aConcerto per piano n. 21, K 467$h[sound recording]

You see, nothing won here.

But an application could provide a special function like

function 
takeEverythingFromSpecUntilYouHitBeforeSubfield(marcspec,hitWhat,record)
{
// get the data before the / or = or else
regex = new RegExp((.*)\\s\\ + hitWhat + \\$)
data = getMARCspec(record, marcspec).match(regex)[1]

// now split on subfield
dataSplit = data.split(/\$[a-z0-9]/)

// loop everything into result
for (i = 1; i  dataSplit.length-1; i++)
{
result += dataSplit[i] +  
}
result += dataSplit[dataSplit.length]

return result
}

In Catmandu or elsewhere the user calls the function

takeEverythingFromSpecUntilYouHitBeforeSubfield(245,/,record)

-- this should result in the desired Concerto per piano n. 21, K 467 [sound 
recording

Re: AW: [librecat-dev] A common MARC record path language

2014-01-21 Thread Patrick Hochstenbach
Hi Carsten

Excuses for the late reply, it took some while to get the system booted
after winter vacations.

You are right in the discussion about which parts should be specified by a
MARCspec language and which part should be implemented as operations on
nodes found. I gave the examples not as a hit for the implementation
language (e.g. if it requires regular expressions or not) but as a
examples of MARC in the wild (non standard tags) and MARC combined with
cataloging rules (where subfields and characters in front of a subfield
have a special meaning).

In daily work I often encounter mapping rules which involve these special
subfield cases (“Take everything from the 245 until you hit the first /
before a subfield”). These things can’t be easily (can it) expressed in
Xpath when using XSTL or MARCspec when using tools like Catmandu..but are
very common and can be shared across tools. I think this would be
candidates to formalise .


Cheers
Patrick

On 06/01/14 16:33, Klee, Carsten carsten.k...@sbb.spk-berlin.de wrote:


On the other hand I could imagine something like 100[0] for the first
100 field (author) and 100[1] for the second and so on. But what is
about repeatable subfields? Maybe someone requires the first subfield a
of the second 100 field. Besides the characters [ and ] are also
valid subfield codes (see [2]).

With substrings it is more complicated. I only could imagine using
regular expressions. Maybe something like 245a[Œ\s(.*)]_10. But for
usability reasons this might be better left to the applications. Isn't
there something in Catmandu like
marc_map('245','my.title', -substring-after = 'Π'); ??

Maybe you have another solution for that?

Another issue I suspect with your last example under
https://metacpan.org/pod/Catmandu::Fix::marc_map

# Copy all 100 subfields except the digits to the 'author' field
marc_map('100^0123456789','author');

In the current MARCspec this would be interpreted as a reference to
subfields ^, 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 of field 100. This is
because ^ is a valid subfield code (see [2]).

So far... I would be happy to read more comments on this.

Cheers!

Carsten
 

[1] https://github.com/cKlee/marc-spec/issues
[2] http://www.loc.gov/marc/specifications/specrecstruc.html#varifields
___
Carsten Klee
Abt. Überregionale Bibliographische Dienste IIE
Staatsbibliothek zu Berlin – Preußischer Kulturbesitz

Fon:  +49 30 266-43 44 02

 -Ursprüngliche Nachricht-
 Von: Patrick Hochstenbach [mailto:patrick.hochstenb...@ugent.be]
 Gesendet: Freitag, 20. Dezember 2013 14:06
 An: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org
 Cc: Klee, Carsten
 Betreff: Re: [librecat-dev] A common MARC record path language
 
 Hi
 
 Thanks for this initiative to formalise the path language for MARC
 records. In Catmandu our path language is better described at:
 https://metacpan.org/pod/Catmandu::Fix::marc_map. It would be an easy
fix
 for us to follow Carsten¹s MARC spec rules and I will gladly implement
it
 for our community.
 
 We see these type of MARC paths in programming libraries such as the
 projects mentioned below but also in products like XSTL, SolrMarc,
 ILS-vendors who need them to define how to index marc, standardisation
 bodies like e.g. that provide mapping rules (e.g.
 http://www.loc.gov/standards/mods/mods-mapping.html). I tried to make a
 small roundup in the past of these projects but it would be great to
have
 more extensive look at all current pratices.
 
 In our Catmandu project we found that Xpaths are too verbose for our
 librarians to interpret and in practise tied to XSLT-programming which
 requires quite some programming skills to read and interpret.
 
 Our paths are very much simplified but still seem to lack some things
that
 are available in the MARC data model which would be great to have
 available in the MARCspec syntax:
 
  - Notion of pointing to the first item (first author)
  - Supporting local defined MARC (sub)fields (e.g. Ex Libris exports
 contain all kind of Z30, CAT , etc fields)
  - Support for pointing to a subfields that follow a specific character
 (e.g. In titles I would like to point to everything after the Œ/Œ in a
245
 field).
 
 Cheers and have a nice holiday
 
 Patrick
 
 
 On 19/12/13 13:16, Jakob Voß v...@gbv.de wrote:
 
 Hi,
 
 Carsten Klee specified a simple path language for MARC records, called
 MARC spec. In short it is a formal syntax to refer to selected parts
 of a MARC record (similar to XPath for XML):
 
 http://collidoscope.de/lld/marcspec-as-string.html
 http://cklee.github.io/marc-spec/marc-spec.html#examples
 
 Similar languages have been invented before but not with a strict
 specification, as far as I know. For instance the perl Catmandu::MARC
 supports references to MARC fields:
 
 https://metacpan.org/pod/Catmandu::Fix::Inline::marc_map
 https://metacpan.org/source/NICS/Catmandu-MARC-
 0.103/lib/Catmandu/Fix/Inli
 ne/marc_map.pm#L26
 
 Could you

Catmandu and MODS::Record

2013-08-06 Thread Patrick Hochstenbach
Hi all

LibreCat
-=-=-=-=

LibreCat is an open collaboration of the university libraries of Lund, Ghent, 
and Bielefeld to create tools for library and research services. One of
the toolkits we provide is called 'Catmandu' 
(http://search.cpan.org/~nics/Catmandu-0.5004/lib/Catmandu.pm) which is a suite 
of tools to do ETL processing
on library data. We provide tools to import data via JSON, YAML, CSV, MARC, 
SRU, OAI-PMH and more. To transform this data we created a small DSL
language that librarians use in our institutions. Also we make it very easy to 
store the results in MongoDB, ElasticSearch, Solr or export it into various
formats.

We create also command line tools because we felt that in our daily jobs we 
were creating the same type of adhoc Perl scripts over and over for endless 
reports. 

E.g. to create a CSV file of all titles in a MARC export we say something like:

$ catmandu convert MARC to CSV --fix 'marc_map(245,title); 
retain_field(record);'  records.mrc

To get all titles from our institutional repository we say:

$ catmandu convert OAI --url http://biblio.ugent.be/oai  to JSON --fix 
'retain_field(title)'

To store a MARC export into a MongoDB we do:

$ catmandu import MARC to MongoDB --database_name mydb --bag data  records.mrc

Here is a blog post about the commands that are available: 
http://librecat.org/catmandu/2013/06/21/catmandu-cheat-sheet.html

See our project page for more information about LibreCat and Catmandu : 

http://librecat.org

and a tutorial how to work with the API

http://librecat.org/tutorial/

MODS::Record
-=-=-=-=-=-=
In one of our Catmandu  projects we created a Perl connector for Fedora Commons 
(http://search.cpan.org/~hochsten/Catmandu-FedoraCommons-0.24). One of our 
goals was to integrate
better with the Islandora project. For this we needed a  Perl MODS parser. As 
there was no module available on CPAN we provide a top level module like 
MARC::Record called MODS::Record
http://search.cpan.org/~hochsten/MODS-Record-0.05/lib/MODS/Record.pm. I hope 
this will be of some help for the community. If there are coders here who would 
like to contribute to
the MODS package please drop me a line. I think CPAN MODS support shouldn't be 
dependent on one coder, one institution.

Greetings from a sunny Belgium,
Patrick