Re: NACO Normalization and Text::Normalize
Hi Brian: thanks for writing, On Mon, Aug 25, 2003 at 04:29:37PM -0300, Brian Cassidy wrote: As part of a previous project I was importing MARC records into an RDBMS structure. In order to facilitate better searching, it was suggested to me that I do some normalization on my data and that NACO normalization would be a good choice for guidelines. So, away I went and came back with normalize() sub which does the trick. I now wonder if this code would have greater utility as a module on CPAN. And if I do decide to upload it to CPAN, perhaps a base class (Text::Normalize) should be created to which NACO normalization could be added as a subclass. I think this is a great idea. At first I was thinking that it would be nice to be able to pass your normalize() function a MARC::Record object, which would magically normalize all the relevant fields (like a good cataloger). This could be a subclass MARC::Record::NACO which adds a new method normalize(), or if Andy was willing could be added to the MARC::Record core. However, the docs [1] seem to say that it is only possible to determine how a field should normalize in the context of the collection of records that it is a part of...and that MARC::Record has no way of determining this, so perhaps this idea is not on target? If you would like to contribute your NACO normalization function to cpan (as I definitely think you should), and my reading of the lc docs are correct, then I would recommend you add a Text::NACO module. The Normalize part is a bit redundant because all the modules in Text do some kind of normalization. The package could export a function normalize() on demand, which you then pass a string, and get back the NACO normalized version. You could also add it to the Biblio namespace as Biblio::NACO, or MARC::NACO, but that's really your call as the module author :) The main thing is to get it up there somewhere. Please post to the list if you decide to upload. I'd like to add a section to the tutorial, and to the perl4lib.perl.org website! //Ed [1] http://lcweb.loc.gov/catdir/pcc/naco/normrule.html
Re: NACO Normalization and Text::Normalize
On Wed, Aug 27, 2003 at 09:15:25AM -0300, Brian Cassidy wrote: * normalize() inputs: either a MARC::Record object or a string. This should probably accept an arbitrary number of inputs so, you can do my @normrecs = normalize( @records ); rather than my @normrecs; foreach my $rec ( @records ) { push @normrecs, normalize( $rec ); } But you still could if you wanted to. Given a M::R object it would do as the rules state [1] for the appropriate fields in the record. Returns a M::R object. Given a string, it would apply the string normalization rules. Returns a string. * compare() inputs: either two M::R objects or two strings. Given two M::R objects, both are normalize()'ed. It would return false (or should it be true?) if, based on the rules [1], some field in $a matches some field in $b. Given two strings, both are again normalize()'ed and a simple cmp is performed. I like the idea of a package MARC::Record::NACO which exports the normalize() and compare() functions. My $.02 are that you not overload normalize() and compare() too much, but create different functions, since you'll have the entire MARC::Record::NACO namespace to play with! normalize( $string ); normalize_record( $record, 100, 110, etc ); compare( $string ); compare_record( $record1, $record2, 100, 110, etc ); I know its heresy, but when it comes to designing programs and interfaces I've come to trust an aspect of the Unix philosophy over the Perl philosophy. Unix: Make each program (function) do one thing well. Perl: DWIM (Do What I Mean) I see you've got CPAN modules up there already, but if you need any help with the test suite or anything I would be willing to help out. At any rate, please post to the list if you end up releasing something. //Ed
Re: MARC::Record leader
On Wed, Sep 10, 2003 at 01:57:31PM -0400, Joshua Ferraro wrote: sub fetch_handler { my ($args) = @_; # warn in fetch_handler; ## troubleshooting my $offset = $args-{OFFSET}; $offset -= 1; ## because $args-{OFFSET} 1 = record #1 chomp (my $bibid = $bib_list[$offset]); my $sql_query = SELECT tag, subfieldcode, subfieldvalue FROM marc_subfi eld_table where bibid=?; my $sth_get = $dbh-prepare($sql_query); $sth_get-execute($bibid); ## create a MARC::Record object my $rec = MARC::Record-new(); ## create the fields while (my @data=$sth_get-fetchrow_array) { my $tag = $data[0]; my $subfieldcode = $data[1]; my $subfieldvalue = $data[2]; my $field = MARC::Field-new( $tag,'','', $subfieldcode = $subfieldvalu e, ); $rec-append_fields($field); ## build the marc string and put into $record my $record = $rec-as_usmarc(); $args-{RECORD} = $record; } The call to as_usmarc() will populate the record length for you. So you shouldn't have to do it yourself when building a record on the fly. We're you getting an error somewhere about the record length not being populated? Your code looks to be creating a bunch of fields each with one subfield in them. This is not correct. Furthermore, it is unlikely that the order that the subfields come back from MySQL is the order in which you will want to build your field...but I may be wrong there (not knowing Koha). I'm sure the Koha folks have some utility for dumping their database as MARC don't they? If not they should :) //Ed
Re: MARC::Record leader
On Thu, Sep 11, 2003 at 08:40:48AM -0500, Chuck Bearden wrote: I hope this helps. This helps for the order of the fields, but from looking at his program it looks like the more pernicious problem is the order of the subfields within each field! //Ed
Re: MARC::Record leader
On Fri, Sep 19, 2003 at 07:58:01PM +0530, Saiful Amin wrote: I never had to worry about the record_length (pos 00-04) or the base_address (pos 12-16) in the leader. I think they are automagically updated while writing the record via $rec-as_usmarc(). saiful++ Yes, they should be automatically calculated when writing the file as marc. print $record-as_usmarc(); Albeit, this method should really be as_marc() or as_marc21() but there you go :) //Ed
Re: MARC::Record Problems
On Thu, Sep 25, 2003 at 07:54:29AM -0400, Joshua Ferraro wrote: Does anyone know how to add separators/terminators when building a single MARC record? Joshua, MARC::Record does this for you. Where is the code you used to generate these records? Is it the Koha code? //Ed
fulltext searching with Perl
In case you missed and are interested in such things, perl.com ran a good article recently on building a full text search engine with Perl and any old relational database. http://www.perl.com/pub/a/2003/09/25/searching.html It provides examples of how to build and use a reverse (inverted) index effeciently, touches fancy re-indexing techniques using Class::Trigger, handy math tricks for normalizing your scores, and tips on using Lingua::Stem::En to implement word stemming. //Ed
Re: Zeta Perl Module Question
On Wed, Nov 12, 2003 at 12:15:38PM +, Stephen Graham wrote: Can't use string ( ) as a HASH ref while strict refs in use at /usr/lib/perl5/5.8.0/ExtUtils/MM_Unix.pm line 541. Weird, I'd be willing to try to help you figure this out if you can point me to the Zeta source. I googled for a bit and wasn't able to find it. Maybe it's time to start using the Net::Z390 module - at least this is being maintained. I think you might be right if you can port your program without too much pain. If you haven't seen it Mike Taylor has a nice page about Z39.50/Perl [1]. //Ed [1] http://www.miketaylor.org.uk/tech/zzperl.html
Re: [patch] Accept # as Blank Indicator
On Wed, Nov 19, 2003 at 07:43:52AM -0500, Morbus Iff wrote: The LC also uses $ to represent sub-tags (I think that's what they're called; just woke up... the $a/$b things). But, I seem to see _a and _b more often. Which is more prevalent? LC's MARCMaker/MARCBreaker utilities use $ if I remember right. It's mainly a typographical convention that should have little bearing on how MARC::Record works. //Ed
Re: [ot] Targeted Spam Harvesting from *lib lists?
On Wed, Nov 19, 2003 at 11:50:05AM -0500, Morbus Iff wrote: Has anyone encountered targeted spam from perl4lib or oss4lib posts? I've posted numerous times to perl4lib, and once to oss4lib. Just now, I suddenly got a spam for BowkerLink, which submits to Ulrich's Periodicals Directory, something right in line with my earlier questions. Besides friends/IM, I've not publicly stated my research into cataloging, so I can only suspect it's an incredibly odd coincidence, or there are some asshole lurkers on the list. ;) Spam is a global problem, and not something isolated to this list. As a seasoned Perl user you are no doubt aware of how easy it is to scrape email addresses from the webpages. While this list isn't moderated (emails reviewed before posting), we are a polite and friendly bunch. Let's not start calling people names. If you have any concerns with the way the perl4lib list is being managed please contact Ask Bjorn Hansen at perl.org (ask at develooper.com). //Ed
Re: MARC::Record in CVS and testing
On Tue, Nov 25, 2003 at 11:14:09AM -0500, Paul Hoffman wrote: Are you familiar with Test::More? It has some cool features that can be tricky (conditionally skipping tests, TODO tests, etc.), so holler if you have questions. I haven't examined MARC::Record's test suite closely, but what I've seen looks very well done, so you should be able to learn from it. When you look at MARC::Record's test suite you'll see it's using Test::More :) Andy is a big testing guru, so we've got a decent state-of-the-art test suite! //Ed
Re: Lint.pm and 250$b
Bryan: On Tue, Nov 18, 2003 at 02:31:59PM -0600, Bryan Baldus wrote: When I ran Lint on a file of records, one of the errors I received was 250: Subfield _b is not allowed. The LC doc [1] is meticulously formatted (which is what allows specs to do what it does). Unfortunately the 250 has a slight defect: --Edition, Imprint, etc. Fields 250-270-- 250 - EDITION STATEMENT (NR) Indicators First - Undefined # - Undefined Second - Undefined # - Undefined Subfield Codes $a - Edition statement (NR) $b - Remainder of edition statement(NR) = HERE $6 - Linkage (NR) $8 - Field link and sequence number (R) Notice how the (NR) following the $b line doesn't have a preceding space like the other equivalent spots? Well, this threw off the regex in specs, and caused the 250 $b to not make it into Lint.pm's rules. Yucky! But it's fixed. Verification before and after with diff show that it was the only data point that had that problem. So **KUDOS** to you for catching it. The fix has been committed to CVS and will go out with the next version of MARC::Record. //Ed [1] http://www.loc.gov/marc/bibliographic/ecbdlist.html
Re: indicators - guilt by association
On Sun, Dec 07, 2003 at 08:53:04PM +0100, Leif Andersson wrote: Recently on this list it was discussed whether letters as indicators should be allowed or not. As I understood it, it was concluded that Field.pm and USMARC.pm should be fixed to allow for this. Good, our national dialect of the MARC format includes letters as indicators at some positions. Ok, this change has been committed to CVS and will go out with the next version of MARC::Record. Sometimes it helps to add the feature request to MARC::Record's queue at rt.cpan.org so that the developers won't forget about it! Currently, if USMARC.pm considers any indicator is to be illegal, both indicators will be changed into blanks! I couldn't replicate this with the latest version of MARC::Record. I did add a test to the test suite to make sure that it is never the case though. //Ed
Re: Net::Z3950 and diacritics
On Tue, Dec 16, 2003 at 03:52:56PM +0100, Tajoli Zeno wrote: 1)When you call LOC without a specific character you recive data in MARC-8 character set. 2) In MARC-8 character set a letter like è [e grave] is done with TWO bytes one for the sign [the grave accent] and one for the letter [the letter e]. 3)In the leader, position 0-4 you have the number of character, NOT the number of bytes. In your record there are 901 characters and 903 bytes. In fact the length function of perl read the number of bytes. The best option, now, is to use charset where 1 character is always 1 byte, for example ISO 8859_1 While this is certainly part of the answer we still don't know why the record length is off. The way I see it, there are two possible options: 1. Net::Z3950 is doing on-the-fly conversion of MARC-8 to Latin1 2. LC's Z39.50 server is emitting the records that way, and not updating the record length. I guess one way to test which one is true would be to query another Z39.50 server for the same record, and see if the same problem existsin which case 1 is probably the case. //Ed
Re: Extracting data from an XML file
On Mon, Jan 05, 2004 at 03:54:09PM -0500, Eric Lease Morgan wrote: The code works, but is really slow. Can you suggest a way to improve my code or use some other technique for extracting things like author, title, and id from my XML? It's slow because you're building a DOM for the entire document, and only using a piece of it. If you use a stream based parser like XML::SAX [1] you should see some good speed improvement, and it won't use so much memory :) XML::SAX uses XML::LibXML, but as a stream. Kip Hampton has a good article High Performance XML Parsing with SAX [2] which should provide some guidance in getting started with XML::SAX. SAX is a generally useful technique (in Java land too), and SAX filters are really neat tools to have in your toolbox. I used them heavily as part of Net::OAI::Harvester [3] since OAI responses can be arbitrarily large, and building a DOM for some of the responses could be harmful. //Ed [1] http://search.cpan.org/perldoc?XML::SAX [2] http://xml.com/pub/a/2001/02/14/perlsax.html [3] http://search.cpan.org/perldoc?Net::OAI::Harvester
Re: Extracting data from an XML file
On Mon, Jan 05, 2004 at 10:27:39PM -0500, Eric Lease Morgan wrote: Since my original implementation is still the fastest, and the newer implementations do not improve the speed of the application, then I must assume that the process is slow because of the XSLT transformations themselves. These transformations are straight-forward: If you can provide me with the data files I would be willing to write a similar benchmark using XML::SAX :) //Ed
Re: MARC::Field::new_from_usmarc problems
On Tue, Jan 13, 2004 at 10:48:57AM +0200, Christoffer Landtman wrote: Any help on the matter is deeply appreciated as I really cannot make anything of this, as it was working on my setup, but not on various other peoples setups. Could you send the program as an attachment to me Chistoffer? I'm not confident that the MARC will have survived translation into the body of your email message. Thanks! //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org The imagination of nature is far, far greater than the imagination of man. [Richard Feynman]
Re: MARC::Field::new_from_usmarc problems
On Tue, Jan 13, 2004 at 06:17:25PM +0100, Leif Andersson wrote: If it is in accordance with a special MARC flavour, then maybe MARC::Record should do something to meet this need? But, we do not know that yet. Yeah, we could have new_from_xxx() for a different MARC flavors I suppose. It might also be nice to be able to: $MARC::RECORD::STRICT = undef; $MARC::RECORD::WARNINGS = undef; Which would have the same effects as: $batch-strict_off(); $batch-warnings_off(); Which would allow for calls to new_from_usmarc() without bailing when something looks fishy...for advanced users only :) But it would be even nicer to know exactly what's going on here first. //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org The deeper I go the darker it gets. [Peter Gabriel]
Re: Tk-MARC-stuff
On Fri, Jan 16, 2004 at 05:15:34PM -0600, David Christensen wrote: Actually, I *was* wondering how to package that all up as a single thingy. I image it would be something like Tk-MARC-0.1, but I've no idea how to bundle packages I'm searching through docs as we speak :-) Well you seemed to be able to bundle up all of them individually, it's really no different. If you need help let me know and I'll lend a hand. //Ed
Perl and GIS data
I'm forwarding this along in case there are any perl4lib folks who are interested in GIS systems/data. //Ed From: Aran Deltac [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Geography Namspace Date: Sat, 7 Feb 2004 12:13:59 -0500 I've begun some preliminary work on the geography/geo/gis namespace issue here: http://wiki.bluedevbox.com/newgeo/index.htm Anyone interested in contributing is welcome to. For anyone not in the loop, this is in response to my original posting and ensuing thread: http://groups.google.com/groups?q=for+Geo::Datahl=enlr=lang_enie=UTF-8oe =UTF-8selm=F5Qqb.78329%24fl1.3346003%40twister.southeast.rr.comrnum=2 As details get hammered out I'll re-post to the group to let people know what is going on. Aran
Re: unsubsribe
On Wed, Feb 25, 2004 at 10:56:17AM -0600, Holly Bravender wrote: Take me off your list! Thank you. Holly, please send a message to [EMAIL PROTECTED] and respond to the confirmation that you should receive. Instructions are available at http://perl4lib.perl.org If you have trouble please email me directly [EMAIL PROTECTED]. Thanks, //Ed
Re: XML Parsing for large XML documents
Hi Rob: On Wed, Feb 25, 2004 at 03:31:07PM -0500, Robert Fox wrote: 1. Am I using the best XML processing module that I can for this sort of task? XPath expressions require building a document object model (DOM) of your XML file. Building a DOM for a huge file is extremely expensive since it converts your XML file into an in memory tree structure, where each element is a node. You system is probably digging into virtual memory (to disk) to keep the monster in memory...which means slow. And you need to slurp the whole thing in before any work can actually start. When processing large XML files you'll want to use a stream based parser like XML::SAX. 2. Has anyone else processed documents of this size, and what have they used? Yep, I've used XML::SAX recently and XML::Parser back in the day. XML::Parser use is depracated now, but once upon a time it was cutting edge :) 3. What is the most efficient way to process through such a large document no matter what XML processor one uses? Use a stream based parser instead of one that is DOM based. This applies in any language (Java, Python, etc...). There is a series of good articles on SAX parsing from Perl on xml.com [1]. The nice thing about SAX is that it is not Perl specific, so what you learn about SAX can be applied in lots of other languages. SAX filters [2] are also incredibly useful. Good luck! //Ed [1] http://www.xml.com/pub/a/2001/02/14/perlsax.html [2] http://www.xml.com/pub/a/2001/10/10/sax-filters.html
Re: Problems testing MARC::Charset-0.5
On Tue, Feb 17, 2004 at 10:55:35AM -0300, Oberdan Luiz May wrote: I'm running perl 5.8.3 on Solaris 2.6, with the last version of all modules needed, the latest Berkeley DB, all compiled with GCC 3.3.2 . Any hints? There was a bug in MARC::Charset v0.5 which was causing the EastAsian Berkeley DB mapping to fail. The failure wasn't evident when I released v0.5 since MARC::Charset::EastAsian was using the installed BerkeleyDB for lookups rather than the one that is generated as part of the perl Makfile.PL process. Both problems have been fixed and were just uploaded to CPAN as v0.6. If you really want the latest package you can get it from SourceForge here: http://sourceforge.net/projects/marcpm/ Thanks for writing to the list about this Oberdan! //Ed
OCLC ResearchWorks and LC NAF via SOAP
Dan Chudnov recently pointed me in the direction of the LC Name Authority File Web Service that OCLC has made available as part of their ResearchWorks efforts [1]. They are doing some very cool stuff these days. I wanted to see how easy it would be to query the service for authority records for Tolkien. Here's what I came up with: #!/usr/bin/perl use strict; use warnings; use SOAP::Lite; use XML::Simple; ## submit request my $result = SOAP::Lite -proxy( 'http://alcme.oclc.org/eprintsUK/services/NACOMatch' ) -getNameAuthority( name( 'name' = 'Tolkien' ), name( 'maxList' = 100 ) ) -result(); ## parse xml response my $names = XMLin( $result ); ## output each result foreach my $match ( @{ $names-{ match } } ) { print $match-{ establishedForm }, \n; } Which I was pleased with, so I wrote a little command line utility [2] that does the same thing, with documentation if you want to try it out yourself. //Ed [1] http://www.oclc.org/research/researchworks/ [2] http://www.inkdroid.org/code/tools/naf -- Ed Summers aim: inkdroid web: http://www.inkdroid.org Life is short, art long, opportunity fleeting, experience treacherous, judgement difficult. [Hippocrates]
Re: MARC records, and inheritance
Hi Enrico: On Sun, Mar 14, 2004 at 02:47:26PM -0500, Enrico Silterra wrote: I think that having various derived classes of MARC records. Holding, Bib Records, Name Authority, etc would be useful. Interesting question. MARC::Record should handle holdings, authority, classification, and community records just fine since they are all structurally the same. The type of record could be determined by looking at position 6 in the leader. But I'm wondering why you think having MARC::Record::Bibliographic MARC::Record::Authority (etc) would be useful. MARC::Records generic methods already work fine for all these types of MARC records. Enabling MARC::Lint (which does deal with the semantics of the tags) to understand authority, classification, community records would be cool. In fact it's on the shortlist of things to do if you are interested [1]. //Ed [1] http://rt.cpan.org/NoAuth/Bug.html?id=4813
Re: MARC records, and inheritance
On Mon, Mar 15, 2004 at 09:21:49AM -0500, Enrico Silterra wrote: For instance, a holding record has no title fields at all. I think, maybe, the title method should throw an exception, or error when you try to grab the 245 of a holding or other record. (or call a user defined error handler) Perhaps, but there are really only a handful of methods that are specific to bibliographic records. I'm not sure that adding ways that MARC::Record can fail is worth a lot of time and effort. If you are interested in doing this sort of checking it can easily be done in your program by checking position 6 in the leader. if ( substr( $record-leader(), 6, 1 ) !~ /^[acdefgijkmoprt]$/ ) { die uhoh, this ain't a bibliographic record\n; } As I think about this, I am not sure that inheritance is the right tool -- I am beginning to think maybe that there should be records of Bib, Holding, NameAuthority, Community, etc which have their own methods, and which contain a marc record. I understand where you are going with this, but I'm not a big fan of the bibliographic specific methods in MARC::Record in the first place, and am of the opinion that adding more would not be a good idea. The meaning of tags, while pretty stable, could change at any time...and the tags mean different things in different flavors of MARC anyway. Were you processing a bunch of MARC data that had bib/authority/holdings records interspersed? I still think it would be cool if MARC::Lint could grok authority, holdings (etc) records in addition to bib records. Of course there are other people who use MARC::Record, who may agree with you :) This is just the first time I've heard it come up in the last four years. If I haven't scared you off, and you end up using MARC::Record for a project would you be willing to send a couple of sentences describing your work so we could add it to the website? //Ed
info: Making Dictionaries with Perl
perl.com just published an article about creating dictionaries with Perl by Sean Burke. -- Sean Burke is a linguist who helps save dying languages by creating dictionaries for them. He shows us how he uses Perl to lay out and print these dictionaries, using RTF::Writer and some data structure manipulation. http://www.perl.com/pub/a/2004/03/25/dictionaries.html
Re: Adding non standard MARC subfields with MARC::Record
On Fri, Apr 02, 2004 at 11:35:40AM -0500, Michael Bowden wrote: Sirsi uses some non standard subfields to create links between records. Typically these subfields are '?' and '='. How can I add these non standard subfields to records that I am creating/editing with MARC::Record? MARC::Record is actually quite lenient about what you can use as a subfield. $record-append_fields( MARC::Field-new( 245, 0, 0, '?' = 'foo', '=' = 'bar' ) ); Just make sure you quote '?' and '=' or else weirdness will ensue. :) //Ed
Re: STDIN as well as command line input
On Mon, Apr 26, 2004 at 10:14:51AM -0500, Eric Lease Morgan wrote: %bar.pl | foo.pl But alas, foo.pl never seems to get the input sent from bar.pl. It does not seem to read from STDIN. What should I do to my first program (foo.pl) so it can accept command line input as well as input from STDIN? Try using the magic filehandle. So in foo.pl : while ( defined( $line = ) ) { ... } The magic filehandle will read stuff from @ARGV and will also read from STDIN. //Ed -- Eric Lease Morgan (574) 631-8604 -- Ed Summers aim: inkdroid web: http://www.inkdroid.org We act as though comfort and luxury were the chief requirements of life, when all that we need to make us happy is something to be enthusiastic about. [Einstein]
Re: baffling perl/linux problem
On Wed, Jun 23, 2004 at 11:25:48AM -0700, Jon Legree wrote: Any suggestions, comments, assistance will be greatly appreciated. Are we talking about patc_server.cgi? Just out of curiosity what is the $datapath that is defined at the top of patc_server.cgi file which indicates what directory to use for storage? Does the directory exist on your filesystem? Since I'm grasping in the dark I'll grasp a bit further. If you are upgrading it might be a good idea to get up to perl = 5.8.0 since perl 5.6 utf8 support was reportedly buggyand I bet your locale w/ RH7.1 is set to utf8. //Ed
Re: Displaying diacritics in a terminal vs. a browser
On Thu, Jul 01, 2004 at 11:22:42AM -0400, Houghton,Andrew wrote: I'm not sure what MARC::Charset does internally, but MARC-8 defines the diacritic separate from the base character. So even using binmode(STDOUT,:utf8) will produce two characters, one for the base character followed by the diacritic. If you want them combined then you need to combine them. As you suggest Andy, MARC::Charset simply translates MARC-8 combining characters into UTF-8 combining characters. It just so happens that I have recently been converting MARC-XML to RDF. The RDF specification mandates Unicode Normal form C, which means that the base character and the diacritic are combined. MARC-XML uses Unicode Normal form D, which means that the base character is separate from the diacritic. So I hacked together some Perl scripts to convert Unicode NFD - Unicode NFC. The scripts require Perl 5.8.0. Wow, I've always been under the impression that the character sets operated the same in RDF as they do in XML proper with the 'encoding' attribute: ?xml version=1.0 encoding=UTF-8 ? I was talking with a colleague, just yesterday, about whether we should unleash these on the Net... They need to be cleaned up a little and need some basic documentation on how to run the Perl scripts. It would be nice to have them wrapped up with a module interface for use in non-command-line apps. I'd would be open to integrating this functionality into MARC::Charset if you are interested. //Ed
Re: Displaying diacritics in a terminal vs. a browser
A MARC-8 sequence places a combining diacritical mark BEFORE the letter it's supposed to combine. Whereas Unicode syntax is to put it AFTER the letter it's supposed to combine with. Hence for example the letter: Z is produced by the MARC-8 Sequence: 75 5A (macron below + Z) but 0331 005A (Z + Combining Macron below) in Unicode. I believe if you don't account for this in your UTF-8 transformation, you will get either no combining or combining with the wrong character. Just FYI in case anyone is curious about what MARC::Charset does, to_utf8() will take care of repositioning the diacritics from before to after the character that they modify. //Ed
Re: Skipping batch erroneous record on batch input
On Thu, Aug 05, 2004 at 08:32:32AM -0500, Anne Highsmith wrote: How do I skip over the erroneous record and keep processing the rest of the file? my $batch = MARC::Batch-new( 'USMARC', 'file.dat' ); $batch-strict_off(); //Ed
Re: [Koha] Cannot add Item - server error
On Tue, Aug 10, 2004 at 02:18:35PM +0200, Paul POULAIN wrote: perl4lib ML, MARC::Record maintainer(s), any idea ? Not really no. I think we'd need chapter and verse from the relevant specs to even start thinking about changing this. Especially after the last go round :) //Ed
Re: [Koha] Cannot add Item - server error
On Tue, Aug 10, 2004 at 04:22:04PM +0200, Paul POULAIN wrote: danmarc2 HAS subfields lower than 010... It would help if documentation could be found that supports this. It would also be nice if we could see a sample of 10 or so sample records as well. //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org Computers are useless--all they can give you are answers. [Pablo Picasso]
Re: Perl MARC 520
I don't feel like I understand the example very well. Have you considered changing the regex to match multiple punctuation marks at the end of line? $abstract =~ m/([a-zA-Z0-9\.]+)[.!?]*\s*$/x; ^ | zero or more -+ //Ed
Re: Warnings during decode() of raw MARC
On Wed, Aug 18, 2004 at 08:23:59AM -0500, Bryan Baldus wrote: Both seem to fail to capture the warnings reported by MARC::File::USMARC. There appears to be a bug in MARC::Batch::next() code at line 123 which extracts the warnings from the newly instantiated MARC::Record object and stuffs them into the MARC::Batch object so that they are available at that level. my @warnings = $rec-warnings(); The bug is that calling warnings() clears the warnings storage in the MARC::Record object as a side effect. MARC::Batch should probably side step calling warnings() and dig into the object directly...or there should be another method that doesn't zap the storage. Let me see if I can duplicate this problem in a test, and then see if the fix actually works. If you can provide a .t file for the MARC::Record distribution that would be handy too :-) //Ed
Re: Warnings during decode() of raw MARC
... I've not usually bothered to look at how the tests or the Makefile.PL work. This is one reason I haven't tried to distribute my modules through CPAN. What no OS X yet!? The drag and drop trick is what you are stuck with in MacPerl, and it's kind of a testament to Perl's flexibility that you can even do this. However you should consider getting your stuff into the CPAN cookie cutter mold if you have the time and energy. Understanding ExtUtils::MakeMaker is not necessary (in fact that way lies madness), but understanding the little that you have to do to get a distribution together is worth the effort. This way you don't have to distribute the code yourself and it is made available at hundreds of mirrors around the world; plus you benefit from the CPAN tools at large: documentation [1], ticketing [2] and testing [3] (among others). Even if you don't send your code to CPAN for the rest of the world to enjoy, you can benefit from having installers for your code. Installers come in handy when you need to migrate your code to a new machine, or when recovering from a failure of some kind [knock on wood]. In general, bundling your code up into installable packages encourages you to think of your software in terms of units of functionality (modules), instead of one big mass of interrelated scripts. Sam Tregar has a book on writing CPAN modules which is a great place to start learning about CPAN if you are interested [4]. //Ed [1] http://search.cpan.org [2] http//rt.cpan.org [3] http://testers.cpan.org/ [4] http://sam.tregar.com/book.html
urchin : RSS aggregator
Apologies if you already saw this over on xml4lib...but it's relevant here given the use of Perl. //Ed === Nature Publishing Group (NPG) are pleased to announce the lastest release of their open-source RSS aggregator 'Urchin' to SourceForge (http://urchin.sf.net). Initially funded by the UK Joint Information Systems Committee (JISC, http://www.jisc.ac.uk/) as one of the Publisher and Library/Learning Systems (PALS, http://www.jisc.ac.uk/index.cfm?name=programme_pals) Metadata and Interoperability Group projects, Urchin has been substantially improved. Version 0.92, the current stable release, introduces the following changes: * A mod_perl front end for performance and persistence * Using XML::LibXSLT to improve performance * A new option for using HTTP status codes for error reporting * Old items can be now be expunged after an update via an rator-defined query * Access, admin and error logs * Web-triggerable remote refresh * Configurable RDF output by adminstrator-defined inclusion or n of namespaces * The ability to combine several simple RDF query conditions using AND and OR * Numerous bug fixes Urchin is a Web based, customisable, RSS aggregator and filter. Its primary purpose is to allow the generation of new RSS feeds by running queries against the collection of items in the Urchin database. However, other arbitrary output formats can be defined and generated using XSL transformations or HTML::Template templates. In other words, the collection of Urchin Perl modules form a foundation for building an RSS aggregation or portal service. Urchin is a classic LAMP implementation written in Perl using, wherever possible, pre-existing Perl modules. It uses MySQL for its database functionality and can run using either Apache with a mod_perl handler or any CGI-enabled web server using the CGI script that is included with the distribution. While Urchin is developed on a Red Hat Linux system, it has been ported to Mac OS X, and earlier versions have run successfully under Windows 2000, XP and CygWin. This code has been tested on Red Hat Linux 8.0 running Apache version 2.0.40, MySQL version 4.0.13 and Perl version 5.8.0, and on Mac OS X 10.3.5 running Apache version 2.0.49, MySQL version 4.0.16 and Perl version 5.8.1. Urchin's feature set includes: * Reads RSS 0.9*, 1.0 and 2.0 * Stores all incoming data in RSS 0.9x and 1.0 feeds * Queryable on arbitrary data fields * Supports boolean, simple RDF, full RDF querying * Arbitrary output formats Alongside version 0.92, a development code snapshot has been released that includes support for importing Atom feeds, new administrative commands for defining feed aggregates, and the ability to use RSS and Atom auto-discovery links. Urchin is Free Software. Portions of the code are licensed under the GNU General Public License, the rest under the GNU Lesser General Public License.
Re: array references
Hi Eric: On Mon, Nov 01, 2004 at 10:40:36PM -0500, Eric Lease Morgan wrote: In a package I'm writing I initialize an array reference (I think) through DBI like this: $self-{author_ids} = $dbh-selectall_arrayref($query); First of all, what sort of foreach loop can I write to iterate through the contents of $self-{author_ids}? De-reference the array reference, here's one example: foreach my $id ( @{ $self-{author_ids} } ) { ... } Second, how do I undefine the value of $self-{author_ids}? $self-{author_ids} = undef; But if I were you I'd have your constructor initialize the slots that can contain array references to an empty array reference: sub new { my $class = shift; my $self = bless {}, $class; $self-{author_ids} = []; } The advantage here is that you won't attempt to use undef as an array reference somewhere in your code. This is a runtime error in Perl, so it can result in an unpredictable program if the code isn't exercised all the time. Third, if I have a list of integers, how to I assign the items in this list to $self-{author_ids}? $self-{author_ids} = [ @list_of_integers ]; or if you don't mind the list being referenced from two locations: $self-{author_ids} = [EMAIL PROTECTED]; Hope this helps! //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org The deeper I go the darker it gets. [Peter Gabriel]
Re: perl-based oai repository
On Thu, Nov 04, 2004 at 12:20:11AM -0500, Eric Lease Morgan wrote: Do y'all know of any Perl-based OAI data repository software. Check out oai-perl, which is from the same group that produce eprints.org, and may in fact be the OAI core of eprints. http://oai-perl.sourceforge.net/ //Ed
Business::ISBN grant
The author of Business::ISBN is looking for a grant from the Perl foundation to update the module to work with 13 digit ISBNs. Business::ISBN is an essential tool for working with ISBNs. If you use the module brian would appreciate it if you could send him a note about how you are using it. Any testimonials will assist him in obtaining the grant. brian d foy can be reached at: [EMAIL PROTECTED] A draft of his grant proposal is below. //Ed -- DRAFT GRANT PROPOSAL TO THE PERL FOUNDATION November 23, 2004 Name brian d foy Email [EMAIL PROTECTED] Synopsis Update Business::ISBN for the 13-digit ISBN format and Amazon.com web services and WorldCat Benefits to the Perl Community I wrote the Business::ISBN module in , and since then it has become a staple for programmers working in the book business, including publishers, retailers, and librarians. I worked in the US and only programmed the 10 digit stuff that I needed. The ISBN Agency (www.isbn.org) has recently announced that the 13 digit ISBN should become the standard for US publishers (instead of the 10 digit version currently in use) so they match the rest of the world. The current version of Business::ISBN does not support this. Many users of Business::ISBN have said they would like to integrate other ISBN tasks, such as Amazon.com and WorldCat lookups. http://www.oclc.org/worldcat/default.htm http://www.amazon.com/gp/browse.html?node=3435361 Deliverables The namespace will change from Business:: to Biblio::. I will deliver Biblio::ISBN 1.0 which will treat 10 digit (legacy) and 13 digit ISBNs on equal footing. I will update Biblio::ISBN::Data for the latest list of country and publisher codes. Additionally, I will add methods to integrate Business::ISBN with Amazon.com web services and WorldCat so an ISBN object can lookup book information from these sources. Project Schedule This project can start in December, and should take two to three weeks to research, code, test, and upload. Bio I'm the original author of Business::ISBN. Amount requested $1000
Re: Character sets
On Wed, Nov 24, 2004 at 08:22:47AM +, Ashley Sanders wrote: Is MARC::Record trying to treat than as Unicode when in fact they are MARC-8? MARC::Record currently does no transformation of character sets that I'm aware of. There is a completely separate module MARC::Charset which provides some MARC8/UTF8 tranformation support, but it is functionally separate from MARC::Record. //Ed
Re: Future of MARC::Lint
On Thu, Dec 16, 2004 at 04:00:34PM -0600, Andy Lester wrote: If it's actually been removed from the MANIFEST and doesn't ship, then that's a Bad Thing. I want to wait until there IS a replacement distro available that I can point at. It might be a good idea to release the current MARC::Lint as a separate package to CPAN before releasing new versions. That way we have a baseline to work from. Bryan if you need help doing this for the first time (from SourceForge) let me know and I'll give you a hand (inkdroid on AIM and Yahoo). //Ed -- Ed Summers aim: inkdroid web: http://www.inkdroid.org Computers are useless--all they can give you are answers. [Pablo Picasso]
Re: MARC::Record tests
I'm thinking that the MicroLIF failure is due to line endings being different on Mac versions OS X. There is code in MARC::File::MicroLIF::_get_chunk that handles DOS (\r\n) and Unix (\n) line endings, but not Mac (\r). Does anyone know if \r is a legit line ending in MicroLIF? //Ed
Re: MARC::Record and UTF-8
On Fri, Jan 07, 2005 at 08:53:40AM +0100, Ron Davies wrote: I will have a similar project in a few months' time, converting a whole bunch of processing from MARC-8 to UTF-8. I would be very happy to assist in testing or development of a UTF-8 capability for MARC::Record. Is the problem listed in rt.cpan.org (http://rt.cpan.org/NoAuth/Bug.html?id=3707) the only known issue? Correct. A few months ago I hacked at MARC::Record to try to get it to use utf8 for platforms that support perl = 5.8. I backed out these changes because my initial implememtation proved to be faulty. Essentially I treated all data as utf8 if perl was = 5.8 ... but this didn't work out since some valid MARC-8 data is invalid UTF-8. I was bummed. The problem (as Ron correctly points out) is that the Perl function length() is being used to construct the byte offsets in the record directory. This works fine when a character is a byte, but breaks badly on utf8 data since a character is more than one byte. Fortunately there is the bytes pragma which was introduced in 5.6 which has a bytes::length() function which computes the correct length. I belive that bytes::length() was introduced in 5.8 somewhere, it was added on later. I wanted MARC::Record to do the right thing based on position 9 in the leader. But I don't know if this is feasible. Perhaps simply having a flag when you create the MARC::Record, MARC::Batch or MARC::File::USMARC objects will be enough. my $batch = MARC::Batch( 'USMARC', 'file.dat', utf8 = 1 ); or my $record = MARC::Record-new( utf8 = 1 ); Comments, thoughts, hacks welcome :-) This shouldn't be too tough, it just needs some concentrated attention. //Ed
Re: MARC::Record tests and MicroLIF.pm
On Thu, Jan 06, 2005 at 10:03:13PM -0600, Bryan Baldus wrote: Is there any problem with committing the revised version of MARC::File::USMARC, and adding+committing the three files above to cvs in the t/ directory? Nice work :) as long as the tests pass I think comitting sounds like a good idea. //Ed
Re: Ignoring Diacritics accessing Fixed Field Data
Hi Jane: On Tue, Jan 11, 2005 at 01:29:55PM -0500, Jacobs, Jane W wrote: My result was something like: Dave,Ayod\2003 Paòt,Kaâs\2002 Baks,Dasa\2003 ,Viâs\2002 Problem 1: As you can see, I don't really want the first four characters, I want the first four SEARCHABLE characters. How can I tell MARC Record to give me the first four characters, excluding diacritics? What output would you have rather seen? Dave,Ayod\2003 Paot, Kaas\2002 Baks,Dasa\2003 ,Vias\2002 ? Problem 2: In these examples 260 $c works OK, but I could get a cleaner result by accessing the date from the fixed field (008 07-10). How would I do that? I was looking in the tutorial, but couldn't seem to find anything that seemed to help. If I'm missing something there please point it up. You probably want to use the data() method on the MARC::Field object for the '008' field, in combination with substr() to extract a substring based on an offset and a length. my $f008 = $record-field('008'); if ( $f008 ) { $year = substr( $f008-data(), 7, 4 ); } I only added the if statement since it may not be true that all your records have an 008 field... //Ed
Re: MARC::Lint update
On Sun, Jan 23, 2005 at 08:48:50AM -0600, Bryan Baldus wrote: The SourceForge CVS version of MARC::Lint has been updated with new checks (041, 043), revisions to check_245, a new internal _check_article method, the addition of MARC::Lint::CodeData (for 041, 043, etc.), and 2 new tests. Watch for further added check_xxx methods in the near future, as I move them out of MARC::Lintadditions into MARC::Lint. Thanks for the update Bryan. It's great to see the new tests code. I added the new files to the MANIFEST so make test would succeed, and also made some adjustments in MARC::Lint to avoid warnings under 'make test'. There were some unitialized variable warnings, nothing serious. 'make test' will run perl under the warnings pragma, so 'use warnings' in your module will help you catch this sort of thing early. One more kind of pedantic thing: I don't know what editor you use, but it's been the norm for marc/perl module folks to not embed tabs in source code for indentation. vim and emacs both support mapping a tab to spaces when you hit the tab key. The marc/perl code uses four spaces for indentation: if you need help getting your editor to do this indentation let me know. -- Ed Summers aim: inkdroid web: http://www.inkdroid.org He who binds to himself a joy Does the winged life destroy; But he who kisses the joy as it flies Lives in eternity's sun rise. [William Blake]
Re: MARC::Lint update
On Mon, Jan 24, 2005 at 08:37:41AM -0600, Bryan Baldus wrote: I generally 'use warnings' or use the -w flag in the modules and scripts I've been writing. I didn't notice it was missing. I need to add strict and warnings to CodeData, as well. In modules/package files, is it practice to leave out the shebang (#!perl) line, since the file is not generally executed directly? If so, is that the reason for 'use warnings' vs. -w? Yeah. I use BBEdit Lite, which has a good global search/replace function. In the future, I'll try to remember to convert the indentation tabs to 4 spaces per tab. Are non-indentation tabs ok? In MARC::Lint::CodeData, I used split on \t to split the codes into a hash. Since some codes have or need spaces, splitting on \s would probably not work as well. Non-indentation tabs should be fine. There shouldn't be any need to remember to search/replace, try Edit - Text Options - and check Auto-Expand Tabs ... and you should be good to go. I guess for converting the existing tabs search/replace will come in handy though :-) //Ed
Re: listserv vs. Google Group
On Wed, Mar 23, 2005 at 11:11:22AM -0600, Doran, Michael D wrote: The fact that perl4lib postings also go to Google Groups should at least be mentioned in the WELCOME to perl4lib@perl.org automated subscription response (preferably at the top). Nothing was in there as of my July 30, 2003 WELCOME message. Neato, it's news to me that perl4lib is archived in google groups. You should probably know it is in the mail-archive [1] as well, which is mentioned on the perl4lib homepage. You would do best to direct your concerns to Ask Bjørn Hansen (ask at develooper dot com) about the boilerplate since that is under his control. My perspective on this is that listservs, usenet and blogs are converging, and I don't tend to alter my delivery when using them. If you are concerned about personal information getting out there I think you are best served by removing the information from your signature. Of course this doesn't help you with the information that's already out there...unless you take drastic action like moving and changing your name :-) At least Google does you the favor of obscuring your email address, which is nice from a spam standpoint, and is more than archives like web4lib/xml4lib do for you. //Ed [1] http://www.mail-archive.com/perl4lib%40perl.org/ -- Ed Summers aim: inkdroid skype: inkdroid web: http://www.inkdroid.org Give and ye shall receive. [Bram Cohen]
Re: Corrupt MARC records
I wondered if any of you had run into similar problems, or if you had any thoughts on how to tackle this particular issue. It's ironic that MARC::Record *used* to do what Andrew suggests: using split() rather than than substr() with the actual directory lengths. The reason for the switch was just as Andrew pointed out: the order of the tags in the directory is not necessarily the order of the field data. If you need to you could try downloading MARC::Record v1.17 and try using that. Or you could roll your own code and cut and paste it everywhere like Andrew ;-) //Ed
Re: MARC-8 to UTF-8 conversion
Ok, this is great information to have moving forward wi the next MARC::Charset...many thanks Michael and Jason. Micheal you are totally right the installer really shouldn't fail like that...I'd never tested it on a system that lacked DB_File so I didn't know. And CPAN testers didn't pick it up either. //Ed
Re: MARC-8 to UTF-8 conversion
Am I right that this amounts to less than 1Meg (EastAsian.db + UTF8.db)? Depending on your system and your needs (more speed?), that may not be considered large and might fit into memory fine. Otherwise, I think any of the in-core (non-DB_File) DBM files ought to suffice for that amount of data. Which in-core dbm modules are these? I thought DB_File was the defacto standard for doing this... As for the memory, it's not really the memory which I'm concerned about as much as it is the time it would take to build the database everytime someone used the MARC::Charset module. Perhaps I'm falling victim to the curse of premature optimization again though. If I had a text file of 16,000 mappings that was read in everytime someone did a: use MARC::Charset; would people be put out? I imagine folks in mod_perl environments wouldn't care too much--although a MB of ram for each apache process has a way of adding up. At least for high volume sites. //Ed
Code4lib 2006 Conference – Registration Now Open
Code4lib 2006 Conference – Registration Now Open Registration is now open for Code4lib 2006. Code4lib 2006 is a loosely structured conference for library technologists to commune, gather/create/share ideas and software, be inspired, and forge collaborations. It is also an outgrowth of the Access HackFest, wrapped into a conference-ish format. It is the event for technologists building digital libraries and digital information systems, tools, and software. Code4lib 2006 will be held in Corvallis, Oregon, 15-17 February (2006). More information on the conference, including the draft schedule, call for proposals, more detailed logistics information, and the online registration form can be found at the conference website: http://code4lib.org/2006
Re: Using PPM to install MARC-XML?
On 1/11/06, Sperr, Edwin [EMAIL PROTECTED] wrote: Well this is odd: C:\Documents and Settings\esperrppm install marc-xml Installing package 'marc-xml'... Error installing package 'marc-xml': Read a PPD for 'marc-xml', but it is not in tended for this build of Perl (MSWin32-x86-multi-thread) Would bumping up a version help? (I'm on Active State 5.6) Upgrading is probably a good idea. I have a feeling your multi-thread build is somehow preventing you from installing marc-xml -- but I really don't know. An alternative to upgrading might be installing the module manually using Windows nmake. nmake is essentially make for windows and would allow you to download a package directly from CPAN, unpack it, and install it with. perl Makefile.PL nmake nmake test nmake install The only disadvantage to this is that you'll need to grab the dependencies yourself...but you might be able to use ppm for them. The advantage is you don't have to wait for ActiveState to create a ppm...and can get stuff hot of the presses (if you're that sort of person). John Bokma has some good instructions for using nmake on Windows with CPAN modules [1]. //Ed [1] http://johnbokma.com/perl/make-for-windows.html
Re: installing from MARC-Lint or Errorchecks from CPAN
When I downloaded the tarball and installed manually I noticed that the the MANIFEST references a META.yml file, but the tarball doesn't include one. Perhaps this is somehow choking up CPAN? The 'make dist' command should generate a META.yml file for you. I would ask on the cpan-discuss [1] list to see if you can get any guidance from the CPAN admins though. It might be totally unrelated to META.yml. //Ed [1] http://lists.cpan.org/showlist.cgi?name=cpan-discuss
Re: Unimarc, marc21, Unicode, and MARC::File::XML
On 3/16/06, Mike Rylander [EMAIL PROTECTED] wrote: Will some brave soul please test this with some UNIMARC records and let me know how it goes? Yes please, add the test to the test suite if possible Joshua and Paul. miker_++ //Ed
Re: MARC::File::XML 0.85
I apologize, but I'm finding it hard to trace what exactly this script is doing. I did take a look at the first failure and sure enough the record leader says it's 463 bytes but the record itself is 464 bytes. So a failure is warranted -- given the current behavior of MARC::Record. Perhaps dumbing this test script down a bit and making it clear what the heck is being tested would help (at least this developer). //Ed
MARC::Charset v0.97 (important bugfix release)
If what follows seems boring and you use MARC::Charset with any regularity just upgrade MARC::Charset to v0.97. If you are interested in knowing why read on... Thanks for the details [1] Michael. You've uncovered a rather nasty bug in MARC::Charset = v0.8. MARC::Charset::Compiler processes LCs MARC8/Unicode mapping file [2], but was not handling the fact that two character mappings (out of 16398) lacked a recommended ucs mapping, and relied on an alt instead. For example: code isCombiningtrue/isCombining marcEC/marc ucs/ utf-8/ altFE21/alt altutf-8EFB8A1/altutf-8 nameLIGATURE, SECOND HALF / COMBINING LIGATURE RIGHT HALF/name note.../note /code The result of this is that nulls were getting sprinkled in marc8_to_utf8 results if your data happened to contain either: - DOUBLE TILDE, SECOND HALF / COMBINING DOUBLE TILDE RIGHT HALF - LIGATURE, SECOND HALF / COMBINING LIGATURE RIGHT HALF The good news is that MARC::Charset v0.97 has been released to CPAN with a fix to use alt when ucs is not available. The bad news is that if you've used MARC::Charset to convert to utf8 you may have nulls too. I'm sorry :-( If you use MARC::Charset **PLEASE** upgrade to v0.97 immediately. Thanks Michael O'Connor for noticing the bug, and Miker Rylander for the fix. Also going out in this release is a fix from Mike Rylander to allow \r and \n to pass unchanged through marc8_to_utf8 since \r and \n are reported to pop up occasionally in unimarc data. //Ed [1] http://www.nntp.perl.org/group/perl.perl4lib/2007/05/msg2507.html [2] http://www.loc.gov/marc/specifications/codetables.xml
Re: script stresses system
Can you post said script, or send us a URL for it? //Ed
Re: problem with MARC::File::XML on RH 5 64bits
It looks like you don't have an XML parser installed that supports the features that M::F::X requires: use XML::SAX qw(Namespaces Validation); Try executing that, and see if you get a similar exception. FWIW Namespace support is required for the version of MARC::File::SAX that is in CVS since it now uses LocalName rather than Name to determine a tag name. //Ed
Re: Ready for MARC::File::XML release? (was [Patch] Escape marc tag/code/indicators in Marc::File::XML)
+1 Thanks for working on this Galen. //Ed On Sun, Jul 26, 2009 at 8:54 PM, Galen Charltongmcha...@gmail.com wrote: Hi, On Wed, Jul 22, 2009 at 5:04 PM, Dan Scottdeni...@gmail.com wrote: It would be nice to see the 0.91 release get pushed out the door, in any case. 0.88 was a long time ago. Any objections to my pushing out 0.91 as a bugfix release? I've applied Bill's patch and addressed the one CPAN bug against 0.88. Any more patches still sitting on the floor? While I intend to go over MARC::File::XML with a fine-toothed comb some time in the next few weeks, I figure that that the results will be better targeted to a 1.0 release. Regards, Galen
Re: Marc::XML with MARC21
Hi Michele: I copied and pasted the XML from your email and ran it through a simple test script (both attached) and the record seemed to be parsed ok. What do you see if you run the attached test.pl? //Ed test.pl Description: Binary data marc:record xmlns:marc=http://www.loc.gov/MARC21/slim; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd;marc:leader^cam^^22^^i^4500/marc:leadermarc:controlfield tag=001000762662/marc:controlfieldmarc:datafield tag=020 ind1= ind2= marc:subfield code=a8814075913/marc:subfield/marc:datafieldmarc:datafield tag=040 ind1= ind2= marc:subfield code=aIT/marc:subfieldmarc:subfield code=-Servizio Bibliotecario Senese/marc:subfieldmarc:subfield code=eRICA/marc:subfield/marc:datafieldmarc:datafield tag=300 ind1= ind2= marc:subfield code=aVI, 262 p. ;/marc:subfieldmarc:subfield code=c24 cm/marc:subfield/marc:datafieldmarc:datafield tag=653 ind1=0 ind2= marc:subfield code=aNavigazione da diporto/marc:subfieldmarc:subfield code=aLegislazione/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aAntonini,Alfredo/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aMorandi,Francesco/marc:subfield/marc:datafieldmarc:datafield tag=041 ind1=0 ind2= marc:subfield code=aita/marc:subfield/marc:datafieldmarc:datafield tag=245 ind1=1 ind2=0marc:subfield code=aLa navigazione da diporto :/marc:subfieldmarc:subfield code=ble infrastrutture, l' organizzazione, i contratti e le responsabilità :/marc:subfieldmarc:subfield code=batti del convegno, Trieste, 27 marzo 1998 //marc:subfieldmarc:subfield code=ca cura di Alfredo Antonini e Francesco Morandi/marc:subfield/marc:datafieldmarc:datafield tag=260 ind1= ind2= marc:subfield code=aMilano :/marc:subfieldmarc:subfield code=bGiuffrè/marc:subfieldmarc:subfield code=c1999/marc:subfield/marc:datafieldmarc:datafield tag=490 ind1= ind2=0marc:subfield code=aCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=pNuova serie ;/marc:subfieldmarc:subfield code=v0048/marc:subfield/marc:datafieldmarc:datafield tag=760 ind1=1 ind2= marc:subfield code=tCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=g0048/marc:subfield/marc:datafieldmarc:datafield tag=082 ind1= ind2= marc:subfield code=a343.45096/marc:subfieldmarc:subfield code=220/marc:subfield/marc:datafieldmarc:controlfield tag=008^^sxx^|r^|||/marc:controlfield/marc:record
Re: Marc::XML with MARC21
Hi Michele: Yes, I see a UTF-8 encoding error in that file when I try to check it with xmllint (from the libxml2 package): e...@curry:~/Downloads$ xmllint marc.xml marc.xml:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE0 0x20 0x3A 0x3C ld code=ble infrastrutture, l' organizzazione, i contratti e le responsabilit This causes MARC::Record-new_from_xml to blow up too, with a somewhat unhelpful error: not well-formed (invalid token) at line 1, column 1533, byte 1533 at /usr/lib/perl5/XML/Parser.pm line 187 It looks like your xml file might be in ISO-8859-1 (at least the unix file command told me): e...@curry:~/Projects/marc-xml$ file marc.xml marc.xml: ISO-8859 text, with very long lines, with no line terminators So you could try to convert your XML string with Encode before handing it off to MARC::Record-new_from_xml: use Encode; Encode-from_to($xml, 'iso-8859-1', 'utf-8'); I attached the full script which seems to work OK. Note, if you are on ubuntu it looks like they are a few versions back on their libmarc-xml-perl package (v0.88) instead of the latest on CPAN (v0.92) ... and v0.88 doesn't handle namespaces properly... //Ed
Re: Moving to Google Code/svn (was Re: [Patch] Escape marc tag/code/indicators in Marc::File::XML)
On Tue, Mar 16, 2010 at 11:13 PM, Galen Charlton gmcha...@gmail.com wrote: If there are no major objections, in a week's time I plan to make the CVS repo read-only and we'll move forward with Git. Hooray, thanks so much Galen! Sounds like a great plan moving forward. //Ed
marcpm git repository and email addresses
I was planning on rewriting the email addresses used in git repository, so that when we push the code elsewhere (like github) our identities work properly. I went through and pulled out the users that have committed to the cvs repository and mapped them to what I thought were the common email address for the user. If you see yourself mentioned below and would like a different email address just let me know. dereklane derekl...@pobox.com edsummers e...@pobox.com eijabb eij...@cpan.org gmcharlt gmcha...@gmail.com joshferraro j...@liblime.com ltjake bri...@cpan.org miker-pines mi...@esilibrary.com mjordan mjor...@sfu.ca morbus mor...@disobey.com moregan more...@flr.follett.com petdance a...@petdance.com Also, you'll want to set up your git profile appropriately. So in my case: git config --global user.name Ed Summers git config --global user.email e...@pobox.com //Ed
Re: MARC-perl: different versions yield different results
Hi Leif, Is the downside to this approach that you are modifying a CPAN module in place, or is it something to do with the behavior of 'use bytes'? Would there be any undesirable side effects to adding 'use bytes' to MARC::File::USMARC::encode on CPAN? //Ed On Tue, Oct 12, 2010 at 7:58 AM, Leif Andersson leif.anders...@sub.su.se wrote: Myself I have changed one of the modules. MARC::File::USMARC It has a function called encode() around line 315 I have added a use bytes; just before the final return. Like this: use bytes; return join(,$marc-leader, @$directory, END_OF_FIELD, @$fields, END_OF_RECORD); To change directly in code like this is totally no-no to many programmers. If you feel uncomfortable with this, there are other methods doing the same stuff. You could write a package: package MARC_Record_hack; use MARC::File::USMARC; no warnings 'redefine'; sub MARC::File::USMARC::encode() { my $marc = shift; $marc = shift if (ref($marc)||$marc) =~ /^MARC::File/; my ($fields,$directory,$reclen,$baseaddress) = MARC::File::USMARC::_build_tag_directory($marc); $marc-set_leader_lengths( $reclen, $baseaddress ); # Glomp it all together use bytes; return join(,$marc-leader, @$directory, \x1E, @$fields, \x1D); } use warnings; 1; __END__
Re: MARC-perl: different versions yield different results
On Tue, Oct 12, 2010 at 9:05 AM, Leif Andersson leif.anders...@sub.su.se wrote: To sum up. I think it is a good idea to make the MARC blob a binary object, so to speak. I don't know if you should just apply my simple hacks to CPAN code. Or if it is called for a thourough re-write of some parts of the modules. Those changes may involve some changes in coding styles in the scripts that use MARC::Record. But probably all you have to do is to remove all that strange code you put in there as workarounds to the character bugs. And yes, I have been using MARC::Charset in combination with this technique, without any problems that I can recall. :-) I no longer use MARC::Record or MARC::Charset much in my line of work. But I know that others on the list do. I encourage you all to clone the git repository from sourceforge (maybe even move it to GitHub?) and work on making it better...instead of having to hack at the code locally every time there's a new release. //Ed
Re: MARC::Charset 1.33 released
Galen, thanks very much for continuing to develop MARC::Charset. You should feel free to update the Makefile.PL and README to list you as the author now, since you have taken an active role in maintaining it. One of the most gratifying parts of my work as a software developer has been seeing these MARC/Perl modules being maintained by people like you :-) //Ed On Thu, Aug 4, 2011 at 11:47 PM, Galen Charlton gmcha...@gmail.com wrote: Hi, I have uploaded version 1.33 of MARC::Charset to CPAN: 1.33 Thu Aug 4 23:25:14 EDT 2011 - move build_db() to separate .PL script so that module can be built even if Class::Accessor and other dependencies aren't available before Makefile.PL is run. - list GDBM_File as an explicit dependency, as some distributions like ActivePerl don't include it even though it is a core module. My thanks for Jon Gorman for giving me an impetus to look at how well this module does and does not build on Windows Perl distributions. As of now, MARC::Charset should build cleanly on Strawberry Perl. ActiveState Perl likes MARC::Charset less as MARC::Charset depends on GDBM_File, a core Perl module that inexplicably isn't included in ActivePerl. Regards, Galen -- Galen Charlton gmcha...@gmail.com
Re: Finding all the Perl books
On Tue, Nov 8, 2011 at 9:32 AM, Jon Gorman jonathan.gor...@gmail.com wrote: First, on the Library of Congress data, Internet Archive has a snapshot of the LoC information from 2007. It was collected by the Scriblio project http://www.archive.org/details/marc_records_scriblio_net. There's also some other record collections at archive that contain MARC records. There's some good MARC libraries in Perl. It's not widely known, but Internet Archive also subscribe to theweekly updates from LC (I believe back to the Scriblio purchase) andmake them available on the Web (god bless 'em): http://www.archive.org/details/marc_loc_updates I believe all LoC records are present in WorldCat, except for thecatalog records that aren't in electronic form :-) I seem to rememberthere was an impoverished search API that OCLC offers to the generalpublic, and that it's nice one is reserved for OCLC subscribers. Youcould use the SRU module with LC's SRU endpoint: http://z3950.loc.gov:7090/voyager?operation=explain But, depending on what you are doing, I would probably be content tosift through the 481 hits in GoogleBooks and call it a day :-) https://www.googleapis.com/books/v1/volumes?q=perl //Ed
Re: Permission to translate your page at http://marcpm.sourceforge.net/
Hi Anja Sorry for the delay. Yes please feel free to translate it and make it available. I think it's awesome that you want to! I am cc'ing the perl4lib mailing list where people continue to talk about MARC::Record and related modules. Best wishes, //Ed On Tue, Feb 5, 2013 at 6:39 AM, Anja Skrba an...@webhostinggeeks.com wrote: Dear Sir, My name is Anja and I am Computer Science student at the University of Belgrade,Serbia. I found your web page about the machine readable cataloging format of a significant use for the community. Here is the URL of your page: http://marcpm.sourceforge.net/MARC/Doc/Tutorial.html I was wondering if I could translate it to Serbo-Croatian language and post it on my web site? See, my purpose is to help people from Ex Yugoslavia (Serbia, Montenegro, Croatia, Bosnia and Hercegovina, Slovenia and Macedonia) to better understand some very useful information about computer science. I hope I'll hear from you soon. Many kind regards, Anja Skrba http://science.webhostinggeeks.com/ an...@webhostinggeeks.com Tel: +381 62300604
Re: Permission to translate your page at http://marcpm.sourceforge.net/
Hi Anja, Is your translation available as POD? I think it would make a nice companion to the Tutorial by adding it as Tutorial_hrv.pod to Git [1]. I picked hrv because it is the ISO 639 code for Croatian (the language that Google thinks your page is written in). If there is a better code to use you could change it. The advantage to adding it to the package is you could reference it on CPAN, like the existing Tutorial [2]. Your tutorial would be mirrored to all the other CPAN sites, and would stand little chance of being lost :-) I'm cc'ing perl4lib to let Galen know, since he is actively maintaining MARC::Record these days. //Ed [1] https://github.com/gmcharlt/marc-perl/tree/master/marc-record/lib/MARC/Doc [2] http://search.cpan.org/~gmcharlt/MARC-Record/lib/MARC/Doc/Tutorial.pod On Tue, Mar 5, 2013 at 8:30 AM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, I was wondering if you have got my last email with the link of the translation? It would really mean a lot to me if you could take a look and tell me what you think. Thanks a lot, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Mon, Feb 25, 2013 at 10:34 AM, Anja Skrba an...@webhostinggeeks.comwrote: Dear Ed, As per our earlier conversation, I have finished the translation. Please review it here: http://science.webhostinggeeks.com/marc I've referenced your original article at the top and bottom of the page. I would really appreciate if you could place the link to my translation at your website. That would honor my work and benefit a lot of visitors from Ex Yugoslavia to both yours and mine websites. To make it easier for you, here's the HTML code that you can add somewhere within your article: This article is translated to a href= http://science.webhostinggeeks.com/marc;Serbo-Croatian/a language by Anja Skrba from a href=http://webhostinggeeks.com/; Webhostinggeeks.com/a. Please let me know if there are any questions or if you need me to correct something. Keep in touch! Thanks, Anja Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 2:19 PM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, Thanks for your reply. As soon as I finish the translation I will send it to you, so you can review it. Of course, I will include a reference to the original page. Keep in touch, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 1:47 PM, Ed Summers e...@pobox.com wrote: Hi Anja Sorry for the delay. Yes please feel free to translate it and make it available. I think it's awesome that you want to! I am cc'ing the perl4lib mailing list where people continue to talk about MARC::Record and related modules. Best wishes, //Ed On Tue, Feb 5, 2013 at 6:39 AM, Anja Skrba an...@webhostinggeeks.com wrote: Dear Sir, My name is Anja and I am Computer Science student at the University of Belgrade,Serbia. I found your web page about the machine readable cataloging format of a significant use for the community. Here is the URL of your page: http://marcpm.sourceforge.net/MARC/Doc/Tutorial.html I was wondering if I could translate it to Serbo-Croatian language and post it on my web site? See, my purpose is to help people from Ex Yugoslavia (Serbia, Montenegro, Croatia, Bosnia and Hercegovina, Slovenia and Macedonia) to better understand some very useful information about computer science. I hope I'll hear from you soon. Many kind regards, Anja Skrba http://science.webhostinggeeks.com/ an...@webhostinggeeks.com Tel: +381 62300604
Re: Permission to translate your page at http://marcpm.sourceforge.net/
I forgot to add that, if you need help converting your translation to POD and getting it into Git I would be happy to work with you on that. //Ed On Tue, Mar 5, 2013 at 11:59 AM, Ed Summers e...@pobox.com wrote: Hi Anja, Is your translation available as POD? I think it would make a nice companion to the Tutorial by adding it as Tutorial_hrv.pod to Git [1]. I picked hrv because it is the ISO 639 code for Croatian (the language that Google thinks your page is written in). If there is a better code to use you could change it. The advantage to adding it to the package is you could reference it on CPAN, like the existing Tutorial [2]. Your tutorial would be mirrored to all the other CPAN sites, and would stand little chance of being lost :-) I'm cc'ing perl4lib to let Galen know, since he is actively maintaining MARC::Record these days. //Ed [1] https://github.com/gmcharlt/marc-perl/tree/master/marc-record/lib/MARC/Doc [2] http://search.cpan.org/~gmcharlt/MARC-Record/lib/MARC/Doc/Tutorial.pod On Tue, Mar 5, 2013 at 8:30 AM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, I was wondering if you have got my last email with the link of the translation? It would really mean a lot to me if you could take a look and tell me what you think. Thanks a lot, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Mon, Feb 25, 2013 at 10:34 AM, Anja Skrba an...@webhostinggeeks.comwrote: Dear Ed, As per our earlier conversation, I have finished the translation. Please review it here: http://science.webhostinggeeks.com/marc I've referenced your original article at the top and bottom of the page. I would really appreciate if you could place the link to my translation at your website. That would honor my work and benefit a lot of visitors from Ex Yugoslavia to both yours and mine websites. To make it easier for you, here's the HTML code that you can add somewhere within your article: This article is translated to a href= http://science.webhostinggeeks.com/marc;Serbo-Croatian/a language by Anja Skrba from a href=http://webhostinggeeks.com/; Webhostinggeeks.com/a. Please let me know if there are any questions or if you need me to correct something. Keep in touch! Thanks, Anja Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 2:19 PM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, Thanks for your reply. As soon as I finish the translation I will send it to you, so you can review it. Of course, I will include a reference to the original page. Keep in touch, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 1:47 PM, Ed Summers e...@pobox.com wrote: Hi Anja Sorry for the delay. Yes please feel free to translate it and make it available. I think it's awesome that you want to! I am cc'ing the perl4lib mailing list where people continue to talk about MARC::Record and related modules. Best wishes, //Ed On Tue, Feb 5, 2013 at 6:39 AM, Anja Skrba an...@webhostinggeeks.com wrote: Dear Sir, My name is Anja and I am Computer Science student at the University of Belgrade,Serbia. I found your web page about the machine readable cataloging format of a significant use for the community. Here is the URL of your page: http://marcpm.sourceforge.net/MARC/Doc/Tutorial.html I was wondering if I could translate it to Serbo-Croatian language and post it on my web site? See, my purpose is to help people from Ex Yugoslavia (Serbia, Montenegro, Croatia, Bosnia and Hercegovina, Slovenia and Macedonia) to better understand some very useful information about computer science. I hope I'll hear from you soon. Many kind regards, Anja Skrba http://science.webhostinggeeks.com/ an...@webhostinggeeks.com Tel: +381 62300604
Re: Permission to translate your page at http://marcpm.sourceforge.net/
Hi Anja, If POD [1] is new to you I might be asking you to do a bit too much. The main thing is taking your translated text and moving it into a format that looks like: https://raw.github.com/gmcharlt/marc-perl/master/marc-record/lib/MARC/Doc/Tutorial.pod and then committing it back to the Git repository. I think this is something I can do for you if you want. But if you have the interest and time to give it a try I could help you. We can use the perl4lib@perl.orgdiscussion list [2] to coordinate things too. //Ed [1] http://perl4lib.perl.org/ [2] http://perldoc.perl.org/perlpod.html On Thu, Mar 7, 2013 at 7:54 AM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, I never did this so you'll have to explain me how to do it :) Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Mar 6, 2013 at 10:57 AM, Ed Summers e...@pobox.com wrote: I forgot to add that, if you need help converting your translation to POD and getting it into Git I would be happy to work with you on that. //Ed On Tue, Mar 5, 2013 at 11:59 AM, Ed Summers e...@pobox.com wrote: Hi Anja, Is your translation available as POD? I think it would make a nice companion to the Tutorial by adding it as Tutorial_hrv.pod to Git [1]. I picked hrv because it is the ISO 639 code for Croatian (the language that Google thinks your page is written in). If there is a better code to use you could change it. The advantage to adding it to the package is you could reference it on CPAN, like the existing Tutorial [2]. Your tutorial would be mirrored to all the other CPAN sites, and would stand little chance of being lost :-) I'm cc'ing perl4lib to let Galen know, since he is actively maintaining MARC::Record these days. //Ed [1] https://github.com/gmcharlt/marc-perl/tree/master/marc-record/lib/MARC/Doc [2] http://search.cpan.org/~gmcharlt/MARC-Record/lib/MARC/Doc/Tutorial.pod On Tue, Mar 5, 2013 at 8:30 AM, Anja Skrba an...@webhostinggeeks.comwrote: Hi Ed, I was wondering if you have got my last email with the link of the translation? It would really mean a lot to me if you could take a look and tell me what you think. Thanks a lot, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Mon, Feb 25, 2013 at 10:34 AM, Anja Skrba an...@webhostinggeeks.com wrote: Dear Ed, As per our earlier conversation, I have finished the translation. Please review it here: http://science.webhostinggeeks.com/marc I've referenced your original article at the top and bottom of the page. I would really appreciate if you could place the link to my translation at your website. That would honor my work and benefit a lot of visitors from Ex Yugoslavia to both yours and mine websites. To make it easier for you, here's the HTML code that you can add somewhere within your article: This article is translated to a href= http://science.webhostinggeeks.com/marc;Serbo-Croatian/a language by Anja Skrba from a href=http://webhostinggeeks.com/; Webhostinggeeks.com/a. Please let me know if there are any questions or if you need me to correct something. Keep in touch! Thanks, Anja Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 2:19 PM, Anja Skrba an...@webhostinggeeks.com wrote: Hi Ed, Thanks for your reply. As soon as I finish the translation I will send it to you, so you can review it. Of course, I will include a reference to the original page. Keep in touch, Anja Skrba Anja Skrba an...@webhostinggeeks.com http://science.webhostinggeeks.com/ Tel: +38162300604 On Wed, Feb 20, 2013 at 1:47 PM, Ed Summers e...@pobox.com wrote: Hi Anja Sorry for the delay. Yes please feel free to translate it and make it available. I think it's awesome that you want to! I am cc'ing the perl4lib mailing list where people continue to talk about MARC::Record and related modules. Best wishes, //Ed On Tue, Feb 5, 2013 at 6:39 AM, Anja Skrba an...@webhostinggeeks.com wrote: Dear Sir, My name is Anja and I am Computer Science student at the University of Belgrade,Serbia. I found your web page about the machine readable cataloging format of a significant use for the community. Here is the URL of your page: http://marcpm.sourceforge.net/MARC/Doc/Tutorial.html I was wondering if I could translate it to Serbo-Croatian language and post it on my web site? See, my purpose is to help people from Ex Yugoslavia (Serbia, Montenegro, Croatia, Bosnia and Hercegovina, Slovenia and Macedonia) to better understand some very useful information about computer science. I hope I'll hear from you soon. Many kind regards, Anja Skrba http://science.webhostinggeeks.com/ an...@webhostinggeeks.com Tel: +381 62300604