RE: sending marc records into a script that uses MARC::Batch
Thanks for your ideas. I will try your suggestions. John From: Timothy Prettyman [mailto:timo...@umich.edu] Sent: Friday, May 30, 2014 11:39 AM To: perl4lib Subject: Re: sending marc records into a script that uses MARC::Batch I think you have to check for warnings as you read each record, so try moving your error handing code right after the batch->next() call. But Robin's suggestion is good advice, and is probably a more robust way to handle the crud that can show up in a file of marc records. -Tim On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni mailto:bargi...@pusc.it>> wrote: If I'm not wrong, $batch->strict_off(); will avoid your loop to print warnings and stop processing records. HTH. Stefano On 29/mag/2014, at 23.13, John E Guillory wrote: Thanks Timothy for your help. When processing about 5 million records I would expect some crazy records. The new script (incorporating Timothy’s suggestions) exited prematurely on record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I know 260 is publication stuff but it’s not “required”. I’m deliberately printing warnings but again the script exited prematurely. Thanks for assistance. John From: Timothy Prettyman [mailto:timo...@umich.edu<mailto:timo...@umich.edu>] Sent: Thursday, May 29, 2014 11:23 AM To: John E Guillory Cc: perl4lib@perl.org<mailto:perl4lib@perl.org> Subject: Re: sending marc records into a script that uses MARC::Batch For your first question, instead of: $batch = MARC::Batch->new(‘USMARC’,); use: $batch = MARC::Batch->new(‘USMARC’,STDIN); For your second, the error is likely caused when a field you're using as_string() on doesn't exist in the record. So, you could do something like the following: $field = $record->field('008'); $field or do { # check for existence of field print "no 008 field for record\n";# no field next; # skip the field (or whatever) }; $field_008 = $field->as_string(); Hope this helps -Tim Timothy Prettyman LIT/Library Systems University of Michigan On Thu, May 29, 2014 at 12:08 PM, John E Guillory mailto:jo...@lsu.edu>> wrote: Hello, Two questions please: 1. I’ve written a script that opens a marc file for reading using this syntax: $file = $ARGV[0]; $batch = MARC::Batch->new('USMARC',$file); It then loops thru the records using this syntax: while ( $record = $batch->next()) { …..check position 6, 7 of leader and position 23 of 008 and make some changes } This works great. However, instead of accessing the file this way, I want to pipe the output of a previously run marc dump command directly into this script via the pipe. I understand that this can be done using this syntax:while ($line =){ …}, but I don’t understand how to use that STDIN with “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = MARC::Batch->new(‘USMARC’,); 2. My current script successfully reads and processes a marc file of over 5 gigs!but exits entirely on record 160,585 with the error from MARC::Batch, “Can't call method "as_string" on an undefined value at ./marc_batch.pl<http://marc_batch.pl/>”. Documentation on using MARC::Batch says that to tell it to continue processing even when errors are encountered one should use strict_off(), then print/report warnings at the bottom of the script. I don’t think my particular error is being handled by the strict_off() setting. Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch. Thanks for ensights! use MARC::Batch; $file = $ARGV[0]; chomp($file); $batch = MARC::Batch->new('USMARC',$file); $batch->strict_off();# otherwise script exits when encounters errors open(OUT,'>new_marc'); while ( $record = $batch->next()) { $leader= $record->leader(); $leader_pos_6 = substr($leader,6,1); $leader_pos_7 = substr($leader,7,1); $field = $record->field('008'); $field_008 = $field->as_string(); $field_008_position_23 = substr($field_008,23,1); if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) { $control_num= $record->field('001'); $control_num= $control_num->as_string(); print "008 position 23: $field_008_position_23 \n"; print "OLD leader: $leader \n"; $old_leader = $leader; substr($leader,6,1) = 'm'; print "NEW leader: $leader \n";
Re: sending marc records into a script that uses MARC::Batch
I think you have to check for warnings as you read each record, so try moving your error handing code right after the batch->next() call. But Robin's suggestion is good advice, and is probably a more robust way to handle the crud that can show up in a file of marc records. -Tim On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni wrote: > If I'm not wrong, > $batch->strict_off(); > will avoid your loop to print warnings and stop processing records. > HTH. Stefano > > On 29/mag/2014, at 23.13, John E Guillory wrote: > > Thanks Timothy for your help. > > > > When processing about 5 million records I would expect some crazy records. > The new script (incorporating Timothy’s suggestions) exited prematurely on > record 85,877 with: “Warnings detected: Entirely empty subfield found in > tag 260”. I know 260 is publication stuff but it’s not “required”. I’m > deliberately printing warnings but again the script exited prematurely. > > > > Thanks for assistance. > > John > > > > > > > > > > > > > > *From:* Timothy Prettyman [mailto:timo...@umich.edu] > *Sent:* Thursday, May 29, 2014 11:23 AM > *To:* John E Guillory > *Cc:* perl4lib@perl.org > *Subject:* Re: sending marc records into a script that uses MARC::Batch > > > > For your first question, instead of: > > > > $batch = MARC::Batch->new(‘USMARC’,); > > > > use: > > > > $batch = MARC::Batch->new(‘USMARC’,STDIN); > > > > For your second, the error is likely caused when a field you're using > as_string() on doesn't exist in the record. > > > > So, you could do something like the following: > > > > $field = $record->field('008'); > > $field or do { # check for > existence of field > >print "no 008 field for record\n";# no field > >next; # skip the field > (or whatever) > > }; > > $field_008 = $field->as_string(); > > > > Hope this helps > > > > -Tim > > > > Timothy Prettyman > > LIT/Library Systems > > University of Michigan > > > > On Thu, May 29, 2014 at 12:08 PM, John E Guillory wrote: > > Hello, > > Two questions please: > > > > 1. I’ve written a script that opens a marc file for reading using > this syntax: > > > > $file = $ARGV[0]; > > $batch = MARC::Batch->new('USMARC',$file); > > > > It then loops thru the records using this syntax: > > while ( $record = $batch->next()) { > > …..check position 6, 7 of leader and position 23 of 008 and make > some changes > > } > > > > This works great. However, instead of accessing the file this way, I want > to pipe the output of a previously run marc dump command directly into this > script via the pipe. > > I understand that this can be done using this syntax:while ($line > =){ …}, but I don’t understand how to use that STDIN with > “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = > MARC::Batch->new(‘USMARC’,); > > > > 2. My current script successfully reads and processes a marc file of > over 5 gigs!but exits entirely on record 160,585 with the error from > MARC::Batch, “Can't call method "as_string" on an undefined value at ./ > marc_batch.pl”. Documentation on using MARC::Batch says that to tell it > to continue processing even when errors are encountered one should use > strict_off(), then print/report warnings at the bottom of the script. I > don’t think my particular error is being handled by the strict_off() > setting. Doesn’t anybody know what causes/how to fix “Can’t call method > as_string?” error? Full script below—it’s pretty short, thanks to > MARC::Batch. > > > > Thanks for ensights! > > > > > > use MARC::Batch; > > > > $file = $ARGV[0]; > > chomp($file); > > > > $batch = MARC::Batch->new('USMARC',$file); > > $batch->strict_off();# otherwise script exits when encounters errors > > > > open(OUT,'>new_marc'); > > > > while ( $record = $batch->next()) { > > $leader= $record->leader(); > > $leader_pos_6 = substr($leader,6,1); > > $leader_pos_7 = substr($leader,7,1); > > > > $field = $record->field('008'); > > $field_008 = $field->as_string(); > > $field_008_position_23 = substr($field_008,23,1); > > >
Re: sending marc records into a script that uses MARC::Batch
If I'm not wrong, $batch->strict_off(); will avoid your loop to print warnings and stop processing records. HTH. Stefano On 29/mag/2014, at 23.13, John E Guillory wrote: > Thanks Timothy for your help. > > When processing about 5 million records I would expect some crazy records. > The new script (incorporating Timothy’s suggestions) exited prematurely on > record 85,877 with: “Warnings detected: Entirely empty subfield found in tag > 260”. I know 260 is publication stuff but it’s not “required”. I’m > deliberately printing warnings but again the script exited prematurely. > > Thanks for assistance. > John > > > > > > > From: Timothy Prettyman [mailto:timo...@umich.edu] > Sent: Thursday, May 29, 2014 11:23 AM > To: John E Guillory > Cc: perl4lib@perl.org > Subject: Re: sending marc records into a script that uses MARC::Batch > > For your first question, instead of: > > $batch = MARC::Batch->new(‘USMARC’,); > > use: > > $batch = MARC::Batch->new(‘USMARC’,STDIN); > > For your second, the error is likely caused when a field you're using > as_string() on doesn't exist in the record. > > So, you could do something like the following: > > $field = $record->field('008'); > $field or do { # check for existence > of field >print "no 008 field for record\n";# no field >next; # skip the field > (or whatever) > }; > $field_008 = $field->as_string(); > > Hope this helps > > -Tim > > Timothy Prettyman > LIT/Library Systems > University of Michigan > > > On Thu, May 29, 2014 at 12:08 PM, John E Guillory wrote: > Hello, > Two questions please: > > 1. I’ve written a script that opens a marc file for reading using this > syntax: > > > $file = $ARGV[0]; > $batch = MARC::Batch->new('USMARC',$file); > > It then loops thru the records using this syntax: > while ( $record = $batch->next()) { > …..check position 6, 7 of leader and position 23 of 008 and make > some changes > } > > This works great. However, instead of accessing the file this way, I want to > pipe the output of a previously run marc dump command directly into this > script via the pipe. > I understand that this can be done using this syntax:while ($line > =){ …}, but I don’t understand how to use that STDIN with > “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = > MARC::Batch->new(‘USMARC’,); > > 2. My current script successfully reads and processes a marc file of > over 5 gigs!but exits entirely on record 160,585 with the error from > MARC::Batch, “Can't call method "as_string" on an undefined value at > ./marc_batch.pl”. Documentation on using MARC::Batch says that to tell it to > continue processing even when errors are encountered one should use > strict_off(), then print/report warnings at the bottom of the script. I don’t > think my particular error is being handled by the strict_off() setting. > Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” > error? Full script below—it’s pretty short, thanks to MARC::Batch. > > > Thanks for ensights! > > > use MARC::Batch; > > $file = $ARGV[0]; > chomp($file); > > $batch = MARC::Batch->new('USMARC',$file); > $batch->strict_off();# otherwise script exits when encounters errors > > open(OUT,'>new_marc'); > > while ( $record = $batch->next()) { > $leader= $record->leader(); > $leader_pos_6 = substr($leader,6,1); > $leader_pos_7 = substr($leader,7,1); > > $field = $record->field('008'); > $field_008 = $field->as_string(); > $field_008_position_23 = substr($field_008,23,1); > > if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && > ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) { > >$control_num= $record->field('001'); >$control_num= $control_num->as_string(); > >print "008 position 23: $field_008_position_23 \n"; >print "OLD leader: $leader \n"; >$old_leader = $leader; >substr($leader,6,1) = 'm'; >print "NEW leader: $leader \n"; > >print OUT $record->as_usmarc(); > print "$control_
Re: sending marc records into a script that uses MARC::Batch
John E Guillory schreef op do 29-05-2014 om 21:13 [+]: > “Warnings detected: Entirely empty subfield found in tag 260” An entirely empty subfield is an illegally formatted thing, at least according to the rules of MARC::Record/MARC::Field, and so I assume the MARC format itself. So it's not that it's a required field or anything like that, it's that the USMARC is incorrectly formatted, so the parser throws an exception with 'die'. To catch the exception rather than having your program terminate, you need to wrap the call that's failing in an 'eval' block, and check for errors after it, handling them appropriately. You might be lucky and the file is OK and the parser can continue, however you might be unlucky and this corrupt record causes the parser to get confused and it can't find the start of the next record. See 'perldoc -f eval' for more information on using it for error/exception handling. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF signature.asc Description: This is a digitally signed message part
RE: sending marc records into a script that uses MARC::Batch
Thanks Timothy for your help. When processing about 5 million records I would expect some crazy records. The new script (incorporating Timothy’s suggestions) exited prematurely on record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I know 260 is publication stuff but it’s not “required”. I’m deliberately printing warnings but again the script exited prematurely. Thanks for assistance. John From: Timothy Prettyman [mailto:timo...@umich.edu] Sent: Thursday, May 29, 2014 11:23 AM To: John E Guillory Cc: perl4lib@perl.org Subject: Re: sending marc records into a script that uses MARC::Batch For your first question, instead of: $batch = MARC::Batch->new(‘USMARC’,); use: $batch = MARC::Batch->new(‘USMARC’,STDIN); For your second, the error is likely caused when a field you're using as_string() on doesn't exist in the record. So, you could do something like the following: $field = $record->field('008'); $field or do { # check for existence of field print "no 008 field for record\n";# no field next; # skip the field (or whatever) }; $field_008 = $field->as_string(); Hope this helps -Tim Timothy Prettyman LIT/Library Systems University of Michigan On Thu, May 29, 2014 at 12:08 PM, John E Guillory mailto:jo...@lsu.edu>> wrote: Hello, Two questions please: 1. I’ve written a script that opens a marc file for reading using this syntax: $file = $ARGV[0]; $batch = MARC::Batch->new('USMARC',$file); It then loops thru the records using this syntax: while ( $record = $batch->next()) { …..check position 6, 7 of leader and position 23 of 008 and make some changes } This works great. However, instead of accessing the file this way, I want to pipe the output of a previously run marc dump command directly into this script via the pipe. I understand that this can be done using this syntax:while ($line =){ …}, but I don’t understand how to use that STDIN with “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = MARC::Batch->new(‘USMARC’,); 2. My current script successfully reads and processes a marc file of over 5 gigs!but exits entirely on record 160,585 with the error from MARC::Batch, “Can't call method "as_string" on an undefined value at ./marc_batch.pl<http://marc_batch.pl>”. Documentation on using MARC::Batch says that to tell it to continue processing even when errors are encountered one should use strict_off(), then print/report warnings at the bottom of the script. I don’t think my particular error is being handled by the strict_off() setting. Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch. Thanks for ensights! use MARC::Batch; $file = $ARGV[0]; chomp($file); $batch = MARC::Batch->new('USMARC',$file); $batch->strict_off();# otherwise script exits when encounters errors open(OUT,'>new_marc'); while ( $record = $batch->next()) { $leader= $record->leader(); $leader_pos_6 = substr($leader,6,1); $leader_pos_7 = substr($leader,7,1); $field = $record->field('008'); $field_008 = $field->as_string(); $field_008_position_23 = substr($field_008,23,1); if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) { $control_num= $record->field('001'); $control_num= $control_num->as_string(); print "008 position 23: $field_008_position_23 \n"; print "OLD leader: $leader \n"; $old_leader = $leader; substr($leader,6,1) = 'm'; print "NEW leader: $leader \n"; print OUT $record->as_usmarc(); print "$control_num|$old_leader|$leader|$field_008\n"; } else { # not a match so just print this one unchanged… print OUT $record->as_usmarc(); } } # handles errors: if (@warnings = $batch->warnings()) { print "\n Warnings detected: \n", @warnings; } close(OUT); close(LOG); John Guillory Louisiana Library Network 225.578.3758
Re: sending marc records into a script that uses MARC::Batch
For your first question, instead of: $batch = MARC::Batch->new(‘USMARC’,); use: $batch = MARC::Batch->new(‘USMARC’,STDIN); For your second, the error is likely caused when a field you're using as_string() on doesn't exist in the record. So, you could do something like the following: $field = $record->field('008'); $field or do { # check for existence of field print "no 008 field for record\n";# no field next; # skip the field (or whatever) }; $field_008 = $field->as_string(); Hope this helps -Tim Timothy Prettyman LIT/Library Systems University of Michigan On Thu, May 29, 2014 at 12:08 PM, John E Guillory wrote: > Hello, > > Two questions please: > > > > 1. I’ve written a script that opens a marc file for reading using > this syntax: > > > > $file = $ARGV[0]; > > $batch = MARC::Batch->new('USMARC',$file); > > > > It then loops thru the records using this syntax: > > while ( $record = $batch->next()) { > > …..check position 6, 7 of leader and position 23 of 008 and make > some changes > > } > > > > This works great. However, instead of accessing the file this way, I want > to pipe the output of a previously run marc dump command directly into this > script via the pipe. > > I understand that this can be done using this syntax:while ($line > =){ …}, but I don’t understand how to use that STDIN with > “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = > MARC::Batch->new(‘USMARC’,); > > > > 2. My current script successfully reads and processes a marc file of > over 5 gigs!but exits entirely on record 160,585 with the error from > MARC::Batch, “Can't call method "as_string" on an undefined value at ./ > marc_batch.pl”. Documentation on using MARC::Batch says that to tell it > to continue processing even when errors are encountered one should use > strict_off(), then print/report warnings at the bottom of the script. I > don’t think my particular error is being handled by the strict_off() > setting. Doesn’t anybody know what causes/how to fix “Can’t call method > as_string?” error? Full script below—it’s pretty short, thanks to > MARC::Batch. > > > > Thanks for ensights! > > > > > > use MARC::Batch; > > > > $file = $ARGV[0]; > > chomp($file); > > > > $batch = MARC::Batch->new('USMARC',$file); > > $batch->strict_off();# otherwise script exits when encounters errors > > > > open(OUT,'>new_marc'); > > > > while ( $record = $batch->next()) { > > $leader= $record->leader(); > > $leader_pos_6 = substr($leader,6,1); > > $leader_pos_7 = substr($leader,7,1); > > > > $field = $record->field('008'); > > $field_008 = $field->as_string(); > > $field_008_position_23 = substr($field_008,23,1); > > > > if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && > ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) { > > > >$control_num= $record->field('001'); > >$control_num= $control_num->as_string(); > > > >print "008 position 23: $field_008_position_23 \n"; > >print "OLD leader: $leader \n"; > >$old_leader = $leader; > >substr($leader,6,1) = 'm'; > >print "NEW leader: $leader \n"; > > > >print OUT $record->as_usmarc(); > > print "$control_num|$old_leader|$leader|$field_008\n"; > > > > } else { # not a match so just print this one unchanged… > >print OUT $record->as_usmarc(); > > } > > > > } > > > > # handles errors: > > if (@warnings = $batch->warnings()) { > > print "\n Warnings detected: \n", @warnings; > > } > > > > close(OUT); > > close(LOG); > > > > > > > > John Guillory > > Louisiana Library Network > > 225.578.3758 > > >