Re: reading and writing of utf-8 with marc::batch

2013-03-26 Thread Leif Andersson
Hi Eric,

my first guess would be your terminal is not utf8.
If you comment out
#binmode( STDOUT, :utf8 );
and that does the trick, then you can start looking for how to change your 
terminal settings.
(And that can sometimes be a rather frustrating task, I'm afraid)

/Leif Andersson
Stockholm UL

Från: Eric Lease Morgan [emor...@nd.edu]
Skickat: den 26 mars 2013 21:22
Till: perl4lib@perl.org
Ämne: reading and writing of utf-8 with marc::batch

For the life of me I can't figure out how to do reading and writing of UTF-8 
with MARC::Batch.

I have a UTF-8 encoded file of MARC records. Dumping the records and greping 
for a particular string illustrates the validity:

  $ marcdump und.marc | grep Sainte-Face
  und.marc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610 20 _aArchiconfrérie de la Sainte-Face
  13000 records
  $

I then run a Perl script that simply reads each record and dumps it to STDOUT. 
Notice how I define both my input and output as UTF-8:

  #!/shared/perl/current/bin/perl

  # configure
  use constant MARC = './und.marc';

  # require
  use strict;
  use MARC::Batch;

  # initialize
  binmode ( MARC, :utf8 );
  my $batch = MARC::Batch-new( 'USMARC', MARC );
  $batch-strict_off;
  $batch-warnings_off;
  binmode( STDOUT, :utf8 );

  # read  write
  while ( my $marc = $batch-next ) { print $marc-as_usmarc }

  # done
  exit;

But my output is munged:

  $ ./marc.pl  und.mrc
  $ marcdump und.mrc | grep Sainte-Face
  und.mrc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610_aArchiconfrérie de la Sainte-Face
  13000 records
  $

What am I doing wrong!?

--
Eric Lease Morgan
University of Notre Dame

574/631-8604

Re: MARC::Charset 1.34

2013-02-11 Thread Leif Andersson
It gunzips fine, but then there seems to be something wrong with the tar file...

/Leif Andersson
Stockholm University Library

Re: MARC::Charset 1.34

2013-02-11 Thread Leif Andersson
Corrupt tar file RESOLVED.

But here's the background anyway.

I downloaded the MARC-Charset-1.34.tar.gz to Windows7.

The archive is corrupt was the error message from WinRAR on Windows.
Other utils on the same platform agreed.

I made a second download with the same result.

Now - third try - it suddenly works!

So, either something happened during the first two transfers.
Or - some mirror somewhere has a corrupt copy.
(and my downloads happened to use that mirror)

/Leif








Från: Galen Charlton [gmcha...@gmail.com]
Skickat: den 11 februari 2013 20:01
Till: Leif Andersson
Kopia: perl4lib
Ämne: Re: MARC::Charset 1.34

Hi,

On Mon, Feb 11, 2013 at 10:50 AM, Leif Andersson 
leif.anders...@sub.su.semailto:leif.anders...@sub.su.se wrote:
It gunzips fine, but then there seems to be something wrong with the tar file...

Could you elaborate?  In particular, what platform are you on and what error 
message are you getting?

I tried installing MARC::Charset 1.34 via a 'cpan MARC::Charset' on a fresh 
Debian box, and it worked for me.

Regards,

Galen
--
Galen Charlton
gmcha...@gmail.commailto:gmcha...@gmail.com


Re: Anybody know what this USMARC.pm error is?

2011-05-30 Thread Leif Andersson
I am not sure I really got this.
Because if I did:

- The error messages in your orig  posting were not the exact error messages.
- The original posted code was not the actual code producing the errors
- And the sample MARC records, supplied to demonstrate the errors, were 
actually OK

Sometimes you are lucky :-)
Glad you solved the case.

/Leif

Re: MARC blob to MARC::Record object

2011-01-10 Thread Leif Andersson
I hope you will forgive me for a late addendum.
Not only do I have to apologize for the late arrival of this post, I also 
should apologize for its (lack of) seriousness.
Actually - this is in every respect just a programming scherzo, so to speak. 
(Even though the code below works, at least for me)
Now you are all warned. ;-)

So: If you are used to letting MARC::Batch read the records from a file, then 
you can simply read from your database (i.e. your statement handle) like you 
were reading from a file.
Like this:

code
#!/usr/local/bin/perl -w
use DBI;
use MARC::Batch;
use strict;
#BEGIN {
#$ENV{NLS_LANG} =  ...;
#}
my $dbh = DBI-connect(...) || die 1;
$dbh-{LongReadLen} = 9;
$dbh-{LongTruncOk} = 0;
my $sql = q( select GetBibBlob(bib_id) from bib_master where rownum = 3 );
my $sth = $dbh-prepare($sql) || die 2;
my $rv  = $sth-execute() || die 3;
# add some magic:
tie(*MARC, 'dbfile', $sth);
# pass the virtual filehandle to MARC::Batch
my $batch = MARC::Batch-new('USMARC', *MARC );
$batch-strict_off;
# read as usual
while ( my $marc = $batch-next ) {
print $marc-as_formatted(), \n\n;
}

#---
package dbfile;
use strict;
sub TIEHANDLE {
my ($class, $sth) = @_;
my $i = { 'sth' = $sth,
  'eof' = 0, };
bless $i, $class;
}
sub READLINE {
my ($marc) = $_[0]-{sth}-fetchrow_array() ;
if (defined $marc) {
my $len = substr($marc,0,5);
return substr($marc,0,$len);
}
else {
$_[0]-{'eof'} = 1;
return undef;
}
}
sub EOF {
# eof()
$_[0]-{'eof'};
}
sub FILENO {1}
sub BINMODE {1}
sub CLOSE {1}
sub DESTROY {1}
__END__
/code

That's all folks,

/Leif
Leif Andersson, Systems Librarian
Stockholm University Library


Från: Doran, Michael D [do...@uta.edu]
Skickat: den 7 januari 2011 15:11
Till: Leif Andersson; 'Jon Gorman'; perl4lib
Ämne: RE: MARC blob to MARC::Record object

Hi Leif and Jon,

 use MARC::Record;
 ...
 my $record = MARC::Record-new_from_usmarc( $blob );

This works!

 From: Jon Gorman [mailto:jonathan.gor...@gmail.com]
 Sent: Friday, January 07, 2011 7:51 AM
 You'll probably think of this when you get up, but did you make sure
 to import the package? ie use MARC::FILE::USMARC;?

This made the other way work, too! (I had only use MARC::File)

Much thanks to Leif and Jon.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/

 -Original Message-
 From: Leif Andersson [mailto:leif.anders...@sub.su.se]
 Sent: Friday, January 07, 2011 7:50 AM
 To: Doran, Michael D; perl4lib
 Subject: Re: MARC blob to MARC::Record object

 Hi Michael,

 this is how I - in principle - usually do it:

 use MARC::Record;
 ...
 my $record = MARC::Record-new_from_usmarc( $blob );

 /Leif

 Leif Andersson, Systems librarian
 Stockholm University Library
 
 Från: Doran, Michael D [do...@uta.edu]
 Skickat: den 7 januari 2011 00:18
 Till: perl4lib
 Ämne: MARC blob to MARC::Record object

 I am working on a Perl script that retrieves data from our Voyager ILS via an
 SQL query.  Among other data, I have MARC records in blob form, and the script
 processes one MARC record at a time.  I want to be able to parse and
 modify/convert the MARC record (using MARC::Record) before writing/printing
 data to a file.

 How do I make the MARC blob into a MARC::Record object (without having to
 first save it a file and read it in with MARC::File/Batch)?  The MARC blob is
 already in a variable, so it doesn't make sense (to me) to write it out to a
 file just so I can read it back in.  Unless I have to, natch.

 I apologize if I am missing something obvious.

 -- Michael

 # Michael Doran, Systems Librarian
 # University of Texas at Arlington
 # 817-272-5326 office
 # 817-688-1926 mobile
 # do...@uta.edu
 # http://rocky.uta.edu/doran/

Re: MARC blob to MARC::Record object

2011-01-07 Thread Leif Andersson
Hi Michael,

this is how I - in principle - usually do it:

use MARC::Record;
...
my $record = MARC::Record-new_from_usmarc( $blob );

/Leif

Leif Andersson, Systems librarian
Stockholm University Library

Från: Doran, Michael D [do...@uta.edu]
Skickat: den 7 januari 2011 00:18
Till: perl4lib
Ämne: MARC blob to MARC::Record object

I am working on a Perl script that retrieves data from our Voyager ILS via an 
SQL query.  Among other data, I have MARC records in blob form, and the script 
processes one MARC record at a time.  I want to be able to parse and 
modify/convert the MARC record (using MARC::Record) before writing/printing 
data to a file.

How do I make the MARC blob into a MARC::Record object (without having to first 
save it a file and read it in with MARC::File/Batch)?  The MARC blob is already 
in a variable, so it doesn't make sense (to me) to write it out to a file just 
so I can read it back in.  Unless I have to, natch.

I apologize if I am missing something obvious.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/

Re: Moose based Perl library for MARC records

2010-11-09 Thread Leif Andersson
Frédéric,

Just out of curiosity - what was your main motivation for writing another MARC 
module?
In what ways does your distribution differ from MARC::Record?

/Leif

SV: MARC-perl: different versions yield different results

2010-10-13 Thread Leif Andersson
Hi Galen,

Let me tell you I really appreciate the work you and many others have put down 
in the  MARC::Record suite.
I don't quite consider myself a programmer. I just happen to do some of my work 
by taking advantage of programming resources that are available to me.

As for the patch, I am not sure my brute hack qualifies as a one.
But for now I will leave that up to others to decide.

/Leif


Från: Galen Charlton [gmcha...@gmail.com]
Skickat: den 12 oktober 2010 17:35
Till: Leif Andersson
Kopia: Al; perl4lib@perl.org
Ämne: Re: MARC-perl: different versions yield different results

Hi Leif,

On Tue, Oct 12, 2010 at 10:58 AM, Leif Andersson
leif.anders...@sub.su.se wrote:
 To change directly in code like this is totally no-no to many programmers.
 If you feel uncomfortable with this, there are other methods doing the same 
 stuff.

As it happens, this is the very mailing list where patches to MARC::*
are typically discussed.  Feel free to send one.

Regards,

Galen
--
Galen Charlton
gmcha...@gmail.com

Re: MARC-perl: different versions yield different results

2010-10-12 Thread Leif Andersson
This has nothing to do with Perl versions.

MARC::Record 1.38 and earlier does not display this problem.
MARC::Record 2.0.0, the so called unicode version, introduced the problem you 
describe.
That is when writing records: causing incorrect leader length and corrupted 
utf-8

There are different ways to deal with this.
Myself I have changed one of the modules.

MARC::File::USMARC
It has a function called encode() around line 315
I have added a use bytes; just before the final return. Like this:

use bytes;
return join(,$marc-leader, @$directory, END_OF_FIELD, @$fields, 
END_OF_RECORD);

To change directly in code like this is totally no-no to many programmers.
If you feel uncomfortable with this, there are other methods doing the same 
stuff.
You could write a package:

package MARC_Record_hack;
use MARC::File::USMARC;
no warnings 'redefine';
sub MARC::File::USMARC::encode() {
my $marc = shift;
$marc = shift if (ref($marc)||$marc) =~ /^MARC::File/;
my ($fields,$directory,$reclen,$baseaddress) = 
MARC::File::USMARC::_build_tag_directory($marc);
$marc-set_leader_lengths( $reclen, $baseaddress );
# Glomp it all together
use bytes;
return join(,$marc-leader, @$directory, \x1E, @$fields, \x1D);
}
use warnings;
1;
__END__

With the inclusion of this package your original code should work fine, I'd 
guess.


use MARC::Batch;
use MARC_Record_hack;
my $batch = new MARC::Batch('USMARC', $ARGV[0]);
$batch-strict_off ();
$batch-warnings_off ();
#binmode( STDOUT, ':raw' );
#binmode STDOUT;
my $record = $batch-next;
print $record-as_usmarc;


As a habit I use 
binmode FH;
when I write records to file.
It is not needed, but it keeps me from the temptation of doing any other 
assumptions about character encodings.

/Leif Andersson
Stockholm University Library


Från: Al [ra...@berkeley.edu]
Skickat: den 12 oktober 2010 00:03
Till: perl4lib@perl.org
Ämne: MARC-perl: different versions yield different results

Example marc record is here:
http://www.mediafire.com/file/u5cxkrfwh9ew09z/example.zip

When I process the record above in perl 5.8, MARC::Record version 1.38, and
Encode.pm version 2.12, the record comes out fine.

When I use perl 5.10, MARC::Record version 2.0.0, and Encode.pm 2.40 the
record comes out corrupted and MARC::Record will no longer read the result.

The problem is with a Unicode character (big surprise). The earlier version
leaves the \xC3A1 character intact, the later version changes it to \xE1
which is invalid. I've read as many of the perl4lib messages on the subject
of UTF-8 as I could but my eyes are spinning. I'm hoping by including a
complete but simple perl program and making a MARC record available that
somebody can explain to me in detail what is going on. My inclination is to
simply revert to the earlier version of perl but perhaps if I really
understood the issue that may not be necessary.

Here is the test program I use:

use MARC::Batch;
my $batch = new MARC::Batch('USMARC', $ARGV[0]);
$batch-strict_off ();
$batch-warnings_off ();
#binmode( STDOUT, ':utf8' );
my $record = $batch-next;
print $record-as_usmarc;

Run the program on the record, then run it again on the output and the
second time perl quits with an error:

utf8 \xE1 does not map to Unicode at Encode.pm line 174.

That should not happen.

Why the different behavior with the different versions? I can't see
anything wrong with the original record - it's valid UTF8 as far as I can
tell. Leader byte 9 is correctly set to 'a'. Uncommenting the binmode line
seems to work - the character is output unchanged as is supposed to happen.
The problem is my record batches are a mixture of UTF8 and MARC8 and
explicitly setting binmode screws things up. I need a solution that
transparently handles a mix of record encodings.

I rather suspect the problem is with Encode.pm and not MARC perl but I
can't be sure. It also may be due to the way perl handles IO between
version 5.8 and 5.10. BTW the problem happens on Windows and Unix.

Thanks for any advice you can give me,

Al

Re: MARC-perl: different versions yield different results

2010-10-12 Thread Leif Andersson
Hi Ed,

Yes I ment that the drawback is in modifying a CPAN module locally.
Actually, I don't know if there are any undesireable side effects.
None that I know of - I have myself used this technique for almost three years 
now.

The idea is that the MARC::Record object per se should be just binary.
The efforts made in the leap from 1.38 to 2.0.0  to treat this blob as an 
(always well formed!) utf8 string, was a mistake in my eyes.

It has resulted in at least two common problems.
1. when writing records: the leader length / corrupted utf8 problem I responed 
to in my post.
2. when reading bad utf8 records: special care has to be taken so that not your 
whole application just dies at that record

Almost all postings to this forum since 2.0.0 has been concerned with one of 
these problems.
(exaggregating a little, but not much)

To put in use bytes is a shortcut instead of rewriting a whole bunch of code, 
which probably is esthetically more pleasing.
But it is obviously much more work...

And by the way, the second problem can be dealt with by changing
sub MARC::File::Encode::marc_to_utf8 {
return Encode::decode( 'UTF-8', $_[0], 0 );  # do NOT check if UTF-8 is 
valid!
}

Yes, that is also a hack!

To sum up.
I think it is a good idea to make the MARC blob a binary object, so to speak.
I don't know if you should just apply my simple hacks to CPAN code.
Or if it is called for a thourough re-write of some parts of the modules.

Those changes may involve some changes in coding styles in the scripts that use 
MARC::Record.
But probably all you have to do is to remove all that strange code you put in 
there as workarounds to the character bugs.

And yes, I have been using MARC::Charset in combination with this technique, 
without any problems that I can recall. :-)

/Leif



Från: ed.summ...@gmail.com [ed.summ...@gmail.com] f#246;r Ed Summers 
[...@pobox.com]
Skickat: den 12 oktober 2010 17:13
Till: perl4lib@perl.org
Ämne: Re: MARC-perl: different versions yield different results

Hi Leif,

Is the downside to this approach that you are modifying a CPAN module
in place, or is it something to do with the behavior of 'use bytes'?
Would there be any undesirable side effects to adding 'use bytes' to
MARC::File::USMARC::encode on CPAN?

//Ed

On Tue, Oct 12, 2010 at 7:58 AM, Leif Andersson
leif.anders...@sub.su.se wrote:
 Myself I have changed one of the modules.

 MARC::File::USMARC
 It has a function called encode() around line 315
 I have added a use bytes; just before the final return. Like this:

 use bytes;
 return join(,$marc-leader, @$directory, END_OF_FIELD, @$fields, 
 END_OF_RECORD);

 To change directly in code like this is totally no-no to many programmers.
 If you feel uncomfortable with this, there are other methods doing the same 
 stuff.
 You could write a package:

 package MARC_Record_hack;
 use MARC::File::USMARC;
 no warnings 'redefine';
 sub MARC::File::USMARC::encode() {
my $marc = shift;
$marc = shift if (ref($marc)||$marc) =~ /^MARC::File/;
my ($fields,$directory,$reclen,$baseaddress) = 
 MARC::File::USMARC::_build_tag_directory($marc);
$marc-set_leader_lengths( $reclen, $baseaddress );
# Glomp it all together
use bytes;
return join(,$marc-leader, @$directory, \x1E, @$fields, \x1D);
 }
 use warnings;
 1;
 __END__

Re: Stripping out Unicode combining characters (diacritics)

2008-05-06 Thread Leif Andersson
I've been doing it like Mike R suggested for quite some while.
But some characters do not map nicely into this scheme.

So you may want to manually take care of stuff like german eszet, ligature oe 
etc, etc.

s/\x{00df}/ss/g;
s/\x{0152}/Oe/g;
s/\x{0153}/oe/g;
...to be continued...

Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769
Mobile: +46 70 6904281

-Ursprungligt meddelande-
Från: Doran, Michael D [mailto:[EMAIL PROTECTED] 
Skickat: den 6 maj 2008 04:13
Till: Mike Rylander
Kopia: [EMAIL PROTECTED]; Perl4lib
Ämne: RE: Stripping out Unicode combining characters (diacritics)

Hi Mike,

I appreciate the quick reply.  I am familiar with the Unicode::Normalize module 
(and will also be using that), but I left it out of this question because it's 
not relevant to the problem I'm currently trying to solve.  The text I'm trying 
to strip diacritics out of does not have precomposed accented characters.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/



-Original Message-
From: Mike Rylander [mailto:[EMAIL PROTECTED]
Sent: Mon 5/5/2008 8:52 PM
To: Doran, Michael D
Cc: [EMAIL PROTECTED]; Perl4lib
Subject: Re: Stripping out Unicode combining characters (diacritics)
 
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D [EMAIL PROTECTED] wrote:
[snip]

  I'm pulling my hair out on this... so any help would be appreciated.  If 
 there's any other info I can provide, let me know.


You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: [EMAIL PROTECTED]
 | web: http://www.esilibrary.com



Re: Help for utf-8 output

2008-03-01 Thread Leif Andersson
It seems there is a little bug (by design) kicking in.

The leader gets wrong and some characters get wrong in this case:
   + Reading a raw marc record (utf8) from file
   + Turning it into a MARC::Record object
   + Without modification writing it out to file.
 Yes. Even without modification the bug manifests itself!

Let's start with code simply copying one record from a file utf8.mrc containing 
one or more marc records. This basic operation not involving MARC::Record  is 
OK.

#!perl -w
use strict;
#
open(IN, utf8.mrc)  || die 1;
open(OUT, out_good.mrc) || die 2;
binmode IN;
binmode OUT;
#
# Read in raw MARC
$/ = \x1D;
my $marc = IN;
print OUT $marc;
__END__

Now, we're adding MARC::Record to the process, along with some debug info.
Example code producing *faulty* record:

#!perl -w
use strict;
use MARC::Record;
use Devel::Peek;
#
open(IN, utf8.mrc)  || die 1;
open(OUT, out_bad.mrc) || die 2;
binmode IN;
binmode OUT;
#
# Read in raw MARC
$/ = \x1D;
my $marc = IN;
Dump($marc);  # the utf8-flag is not on
my $obj  = MARC::Record-new_from_usmarc( $marc );
# Convert back to raw MARC
my $marc2 = $obj-as_usmarc();
Dump($marc2); # the utf8-flag IS on
print OUT $marc2;
__END__


In this case the leader and actual length will not agree, as your utf8 
characters have turned into latin1.
The problem is that $marc2 has the utf8 flag set internally by Perl.
And the conversion on output is made in spite of binmode.

We can get around the problem by either (for instance)
use bytes;
  or
Encode::_utf8_off($marc2);
before printing to file.

But shouldn't MARC::Record take care of this for us?
A file of MARC records may contain records in different encodings.
The text parts of a MARC record can be treated as made up by certain encodings, 
but the blob itself, I suppose, should be exposed to the caller as pure 
binary.

Are there any drawbacks in letting MARC::Record strip off any eventual utf8 
flag before returning the record as_usmarc() ?
If not I suggest this change be made to a future release of MARC::Record.

I shall also add that this character mess only sets in when doing IO.
If you are updating your databases through one API or another you are probably 
OK!


Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769
Mobile: +46 70 6904281

-Ursprungligt meddelande-
Från: Doran, Michael D [mailto:[EMAIL PROTECTED] 
Skickat: den 21 februari 2008 18:49
Till: perl4lib@perl.org
Ämne: RE: Help for utf-8 output

Hi Jackie,

I'm working on a very similar problem... converting theses/dissertations 
records (in XML) to MARC records.  I'm still in the testing stage, but have had 
similar problems with records with diacritics in the 100 or 245 fields (however 
diacritics in a 520a field don't seem to cause any problems).  Since our 
records are not diacritic rich it's hard to determine the exact extent of the 
problem.

I am using these versions:
  Perl v5.8.8
  MARC::Charset 0.98
  MARC::Lint 1.43
  MARC::Record 2.0
  XML::LibXML 1.66

Here's an example bad record (which I have minimized to just the 245 field):

marcdump test.mrc
test.mrc
LDR 00127cam a2200037   4500
245 13 _aAn Empirical Test Of The Situational Leadership® Model In Japan /
   _cRiho Yoshioka.

 Recs  Errs Filename
- - 
1 1 test.mrc

When I run test.mrc through MARC::Lint, I get this message:

 Invalid record length in record 1: Leader says 00127 bytes but it's actually 
125
 Invalid length in directory for tag 245 in record 1
 field does not end in end of field character in tag 245 in record 1

When examined in vi the character in question, a Registered Sign, appears to be 
correctly UTF-8 encoded C2AE, and the bib Leader (position 09=a) indicates that 
it is Unicode encoded.  I've attached the MARC record.

I noticed that when I run your record (ck245.dat) through MARC::Lint, I get the 
same invalid record length message:

 Invalid record length in record 3: Leader says 00567 bytes but it's actually 
569
 field does not end in end of field character in tag 100 in record 3
 field does not end in end of field character in tag 245 in record 3
 Invalid indicators .10 forced to blanks in record 3 for tag 245

 field does not end in end of field character in tag 260 in record 3
 Invalid indicators .   forced to blanks in record 3 for tag 260

 field does not end in end of field character in tag 300 in record 3
 Invalid indicators .   forced to blanks in record 3 for tag 300

 field does not end in end of field character in tag 502 in record 3
 Invalid indicators .   forced to blanks in record 3 for tag 502

 field does not end in end of field character in tag 504 in record 3
 Invalid indicators .   forced to blanks in record 3 for tag 504

 field does not end in end of field character in tag 690 in record 3
 Invalid indicators . 4 forced to blanks in record 3 for tag 690

Anybody have any ideas?

-- Michael

# Michael Doran

Re: passing parameters to function as variable

2007-08-17 Thread Leif Andersson

Merritt,

I guess you can do it with eval, but you can also do it with an array instead 
of a string.

my @a245 = ('a', 'The Title',
'b', 'Subtitle',
'c', 'Author',
'h', '[Electronic resource]',
);

my $field = $record-field('245');
my $revised_245 =  MARC::Field-new('245', '', '', @a245);

$field-replace_with($revised_245);

Leif Andersson

-Ursprungligt meddelande-
Från: Merritt H Lennox [mailto:[EMAIL PROTECTED] 
Skickat: den 16 augusti 2007 20:35
Till: perl4lib@perl.org
Ämne: passing parameters to function as variable

Hi -

I was wondering if anyone has tried creating a variable containing the
parameter list for a MARC method.

In the snippet below, I'm doing some cleanup on subfields of the 245,
and inserting a default subfield h for those lacking one.  I never know
which subfields I'll have, so rather than using a cumbersome series of
if...elsif...else statements, I'd construct the string I want to pass to
the new() function, and insert it once.  If any subfield doesn't exist,
the SFstring variable for that subfield is empty.

I know that the string in $subfield_list_245 contains the string I want,
from having used the print statement [1], but every run gives me the
error  Field 245 must have at least one subfield.

I've tried using the eval statement on the $subfield_list_245, and using
various quotation marks around it, but new() just doesn't recognize or
want to work with the string.

Is there a way to make this approach work?  Are the blanks I've
concatenated to the end a problem?

Or is there a more succinct way of approaching this in the first place?

Start snippet
if ( ! $subfield_245h ) {

my $subfield_list_245 = $a_SFstring . $b_SFstring . $c_SFstring .
',h=' . ' . [Electronic resource] . ' . $n_SFstring . $
p_SFstring;

## print $subfield_list_245;

my $revised_245 =  MARC::Field-new('245', '',
'',$subfield_list_245);

$record-replace_with($revised_245);


}
end snippet

Thanks for any ideas!

Merritt

[1] for my simple test record, the subfield list prints out:
a='Accountancy Ireland',h='[Electronic resource]'

Merritt Lennox
Library Management System Administrator
550A Bird Library
Syracuse University
Syracuse, NY 13244
315-443-9629
[EMAIL PROTECTED]

Never doubt that a small group
of thoughtful, committed citizens
can change the world.  
Indeed, it's the only thing
that ever has. 
- Margaret Mead 



MARC::Record and importing broken UTF8

2007-03-01 Thread Leif Andersson
Nice,

But what is the best way to deal with all those broken UTF8 encodings we 
encounter over and over again when importing MARC records from outer space?

As it is now the application dies with something like
'utf8 \xXX does not map to Unicode at C:/Perl/lib/Encode.pm line 166.'

The problem seems to lie in MARC::File::Encode

sub marc_to_utf8 {
# if there is invalid utf8 date then this will through an exception
# let's just hope it's valid :-)
return decode( 'UTF-8', $_[0], 1 );
}

Is it possible to introduce a sloppy mode switch?

Leif

==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769
Mobile: +46 70 6904281

-Ursprungligt meddelande-
Från: Mike Rylander [mailto:[EMAIL PROTECTED] 
Skickat: den 26 januari 2007 02:35
Till: Public Open-ILS tech discussion; perl4lib
Ämne: Re: Fwd: Module update for MARC::Record

OK, folks, MARC::Record 2.0.0 is officially out.

  http://search.cpan.org/~mikery/MARC-Record-2.0.0/lib/MARC/Record.pm

Give it a go, and let me know if you see anything broken.  Sorry for the delay!

-miker



MARC::Charset and transcoding of MARC::Record objects

2006-09-22 Thread Leif Andersson

How would you guys do to transcode a whole MARC record, contained in a 
MARC::Record object, from MARC8 to UTF8?

I can see from the documentation it looks quite easy to do the transcoding on 
smaller pieces of data using MARC::Charset.
But how to deal with it when it comes to whole records?

I seem to recall someone on this list mentioning the path MARC Record (MARC8) 
- MARCXML (UTF8) - MARC Record (UTF8)
This trip involves, I'd guess, MARC::File::XML in addition to MARC::Charset

But I suspect there may be different approaches here.

Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769 
Mobile: +46 70 6904281



MARC::Record ordering of fields

2005-05-13 Thread Leif Andersson

How would you do to re-order the fields in a MARC::Record-record?

I just needed that kind of thing and after some struggeling came up with:

@{$record-{_fields}} =
 sort  {
 lc($a-{_tag}) cmp lc($b-{_tag})
   }
 @{$record-{_fields}};


It seems to work, but I am interested in how others would address the same 
problem.

Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library


Re: Sort with MARC::Record

2005-01-31 Thread Leif Andersson

This is one way to do it:

#!/usr/local/bin/perl -w
use strict;
use MARC::Batch;

# sort marc records on field 001
# usage: sort_marc.pl infil.mrc  utfil.mrc

my $batch   = new MARC::Batch( 'USMARC', $ARGV[0] );
my @records = ();
my @f001= ();
my $idx = 0;

while ( my $MARC = $batch-next ) {
push(@records, $MARC);
push(@f001, [$idx++, $MARC-field(001)-as_string]);
}

foreach my $rec (sort { $a-[1] = $b-[1] } @f001) {
print $records[$rec-[0]];
}

__END__

You may need to guard yourself against records having no field 001.
518 records, if that is what you have to deal with, should under normal 
conditions not raise any memory issues.

Leif


-Ursprungligt meddelande-
Från: Jackie Shieh [mailto:[EMAIL PROTECTED]
Skickat: den 31 januari 2005 21:40
Till: perl4lib@perl.org
Ämne: Sort with MARC::Record 



Has anyone sorted a file of hundreds of records by 001?

I have a file of 518 records unsorted and a file of
sorted ids from 001 (406).  I would like to sort my marc
518 records first before extracting the 406 records based
on the 2nd file from the set of 518 records.  I'd appreciate
any suggestions, thanks.

--Jackie

|Jackie Shieh
|Special Projects  Collections Team
|Harlan Hatcher Graduate Library
|University of Michigan
|920 North University
|Ann Arbor, MI  48109-1205
|Phone: 734.936.2401   FAX: 734.615.9788
|E-mail: [EMAIL PROTECTED]


Return values from MARC::Record

2003-11-06 Thread Leif Andersson

I think the return values from various methods in the MARC::Record distribution could 
be more intuitive.
And also more consistent.

If we have a BAD record in $record and try to perform $record-field($tag) we get 0 in 
return.
But if you try $record-subfield($sub) we get undef.

I would rather prefer undef for both.


With the same BAD record we try $subfield = eval { 
$record-field($tag)-subfield($sub) }
This is the only case where we have to put the code in eval.
Should MARC:: take care of the eval for us? I am beginning to think so.


At the bottom of this we have the creation of the record.
What would we expect to get back from these?

my $record1 = MARC::Record-new_from_usmarc(  );
my $record2 = MARC::Record-new_from_usmarc( undef );
my $record3 = MARC::Record-new_from_usmarc( '' );
my $record4 = MARC::Record-new_from_usmarc( 'not a valid record' );

Currently they all provide us with a broken record object.
From the three first I myself would prefer to get undef in return.
That is how MARC::Batch treats the records.

The $record4 is a bit more complicated.
We have to decide what is a valid record in this context?
The answer would be, I'd guess, if we can perform other methods on the object it is a 
valid record (so far).


Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769  
Mobile: +46 70 6904281