Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Patrick Hochstenbach
In Catmandu you can do this with this script (which will also filter out all valid ISSN numbers)… # cpanm Catmandu Catmandu::Identifier $ cat myfix.txt marc_map('***',text.$append) filter(text,'(\b\d{4}-?\d{3}[\dxX]\b)') replace_all(text.*,'.*(\b\d{4}-?\d{3}[\dxX]\b).*',$1) do list(path:text)

Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Sergio Letuche
thank you very much 2016-11-02 12:28 GMT+02:00 Ben Soares : > Hi Sergio, > > Try > > ^\d{4}-\d{3}[\dxX]$ > > if you know that they will always be formatted with a hyphen in the > middle, or > > ^\d{4}-?\d{3}[\dxX]$ > > if you can't be sure of that. > > (and if you're

Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Ben Soares
Hi Sergio, Try ^\d{4}-\d{3}[\dxX]$ if you know that they will always be formatted with a hyphen in the middle, or ^\d{4}-?\d{3}[\dxX]$ if you can't be sure of that. (and if you're interested in spotting ISSNs in the middle of a field use \b\d{4}-?\d{3}[\dxX]\b but beware this also finds year

Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Sergio Letuche
Thank you dear Stefano, i am aware of this module, it works great. But my problem is, what clever regex to use, in order to identify if a subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown in any tag you could imagine... So my approach, would be, to search the whole mrc,

Re: identify ISSN numbers in an mrc file

2016-11-02 Thread Stefano Bargioni
Hi, Sergio: you can try MARCgrep http://en.pusc.it/bib/MARCgrep. Its help is: MARCgrep.pl Extracts MARC records that match a condition on fields. Count and invert are available. SYNOPSIS MARCgrep.pl [options] [-e condition] file.mrc Options: -h print