RE: sending marc records into a script that uses MARC::Batch

2014-06-03 Thread John E Guillory
Thanks for your ideas. I will try your suggestions.

John

From: Timothy Prettyman [mailto:timo...@umich.edu]
Sent: Friday, May 30, 2014 11:39 AM
To: perl4lib
Subject: Re: sending marc records into a script that uses MARC::Batch

I think you have to check for warnings as you read each record, so try moving 
your error handing code right after the batch->next() call.  But Robin's 
suggestion is good advice, and is probably a more robust way to handle the crud 
that can show up in a file of marc records.

-Tim

On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni 
mailto:bargi...@pusc.it>> wrote:
If I'm not wrong,
$batch->strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano

On 29/mag/2014, at 23.13, John E Guillory wrote:


Thanks Timothy for your help.

When processing about 5 million records I would expect some crazy records. The 
new script (incorporating Timothy’s  suggestions) exited prematurely on record 
85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I 
know 260 is publication stuff but it’s not “required”.  I’m deliberately 
printing warnings but again the script exited prematurely.

Thanks for assistance.
John






From: Timothy Prettyman [mailto:timo...@umich.edu<mailto:timo...@umich.edu>]
Sent: Thursday, May 29, 2014 11:23 AM
To: John E Guillory
Cc: perl4lib@perl.org<mailto:perl4lib@perl.org>
Subject: Re: sending marc records into a script that uses MARC::Batch

For your first question, instead of:

 $batch = MARC::Batch->new(‘USMARC’,);

use:

 $batch = MARC::Batch->new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using 
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record->field('008');
$field or do {  # check for existence 
of field
   print "no 008 field for record\n";# no field
   next;  # skip the field (or 
whatever)
};
$field_008 = $field->as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan

On Thu, May 29, 2014 at 12:08 PM, John E Guillory 
mailto:jo...@lsu.edu>> wrote:
Hello,
Two questions please:


1.  I’ve written a script that opens a marc file for reading using this 
syntax:

$file = $ARGV[0];
$batch = MARC::Batch->new('USMARC',$file);

It then loops thru the records using this syntax:
while ( $record = $batch->next()) {
 …..check position 6, 7 of leader and position 23 of 008 and make some 
changes
}

This works great. However, instead of accessing the file this way, I want to 
pipe the output of a previously run marc dump command directly into this script 
via the pipe.
I understand that this can be done using this syntax:while ($line 
=){ …}, but I don’t understand how to use that STDIN with 
“MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = 
MARC::Batch->new(‘USMARC’,);


2.  My current script successfully reads and processes a marc file of over 
5 gigs!but exits entirely on record 160,585 with the error from 
MARC::Batch, “Can't call method "as_string" on an undefined value at 
./marc_batch.pl<http://marc_batch.pl/>”.  Documentation on using MARC::Batch 
says that to tell it to continue processing even when errors are encountered 
one should use strict_off(), then print/report warnings at the bottom of the 
script. I don’t think my particular error is being handled by the strict_off() 
setting. Doesn’t anybody know what causes/how to fix “Can’t call method 
as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch.

Thanks for ensights!


use MARC::Batch;

$file = $ARGV[0];
chomp($file);

$batch = MARC::Batch->new('USMARC',$file);
$batch->strict_off();# otherwise script exits when encounters errors

open(OUT,'>new_marc');

while ( $record = $batch->next()) {
$leader= $record->leader();
$leader_pos_6  = substr($leader,6,1);
$leader_pos_7  = substr($leader,7,1);

$field = $record->field('008');
$field_008 = $field->as_string();
$field_008_position_23 = substr($field_008,23,1);

if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && 
($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {

   $control_num= $record->field('001');
   $control_num= $control_num->as_string();

   print "008 position 23: $field_008_position_23 \n";
   print "OLD leader: $leader \n";
   $old_leader = $leader;
   substr($leader,6,1) = 'm';
   print "NEW leader: $leader \n";


Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Timothy Prettyman
I think you have to check for warnings as you read each record, so try
moving your error handing code right after the batch->next() call.  But
Robin's suggestion is good advice, and is probably a more robust way to
handle the crud that can show up in a file of marc records.

-Tim


On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni  wrote:

> If I'm not wrong,
> $batch->strict_off();
> will avoid your loop to print warnings and stop processing records.
> HTH. Stefano
>
> On 29/mag/2014, at 23.13, John E Guillory wrote:
>
>  Thanks Timothy for your help.
>
>
>
> When processing about 5 million records I would expect some crazy records.
> The new script (incorporating Timothy’s  suggestions) exited prematurely on
> record 85,877 with: “Warnings detected: Entirely empty subfield found in
> tag 260”. I know 260 is publication stuff but it’s not “required”.  I’m
> deliberately printing warnings but again the script exited prematurely.
>
>
>
> Thanks for assistance.
>
> John
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Timothy Prettyman [mailto:timo...@umich.edu]
> *Sent:* Thursday, May 29, 2014 11:23 AM
> *To:* John E Guillory
> *Cc:* perl4lib@perl.org
> *Subject:* Re: sending marc records into a script that uses MARC::Batch
>
>
>
> For your first question, instead of:
>
>
>
>  $batch = MARC::Batch->new(‘USMARC’,);
>
>
>
> use:
>
>
>
>  $batch = MARC::Batch->new(‘USMARC’,STDIN);
>
>
>
> For your second, the error is likely caused when a field you're using
> as_string() on doesn't exist in the record.
>
>
>
> So, you could do something like the following:
>
>
>
> $field = $record->field('008');
>
> $field or do {  # check for
> existence of field
>
>print "no 008 field for record\n";# no field
>
>next;  # skip the field
> (or whatever)
>
> };
>
> $field_008 = $field->as_string();
>
>
>
> Hope this helps
>
>
>
> -Tim
>
>
>
> Timothy Prettyman
>
> LIT/Library Systems
>
> University of Michigan
>
>
>
> On Thu, May 29, 2014 at 12:08 PM, John E Guillory  wrote:
>
> Hello,
>
> Two questions please:
>
>
>
> 1.  I’ve written a script that opens a marc file for reading using
> this syntax:
>
>
>
> $file = $ARGV[0];
>
> $batch = MARC::Batch->new('USMARC',$file);
>
>
>
> It then loops thru the records using this syntax:
>
> while ( $record = $batch->next()) {
>
>  …..check position 6, 7 of leader and position 23 of 008 and make
> some changes
>
> }
>
>
>
> This works great. However, instead of accessing the file this way, I want
> to pipe the output of a previously run marc dump command directly into this
> script via the pipe.
>
> I understand that this can be done using this syntax:while ($line
> =){ …}, but I don’t understand how to use that STDIN with
> “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch =
> MARC::Batch->new(‘USMARC’,);
>
>
>
> 2.  My current script successfully reads and processes a marc file of
> over 5 gigs!but exits entirely on record 160,585 with the error from
> MARC::Batch, “Can't call method "as_string" on an undefined value at ./
> marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it
> to continue processing even when errors are encountered one should use
> strict_off(), then print/report warnings at the bottom of the script. I
> don’t think my particular error is being handled by the strict_off()
> setting. Doesn’t anybody know what causes/how to fix “Can’t call method
> as_string?” error? Full script below—it’s pretty short, thanks to
> MARC::Batch.
>
>
>
> Thanks for ensights!
>
>
>
>
>
> use MARC::Batch;
>
>
>
> $file = $ARGV[0];
>
> chomp($file);
>
>
>
> $batch = MARC::Batch->new('USMARC',$file);
>
> $batch->strict_off();# otherwise script exits when encounters errors
>
>
>
> open(OUT,'>new_marc');
>
>
>
> while ( $record = $batch->next()) {
>
> $leader= $record->leader();
>
> $leader_pos_6  = substr($leader,6,1);
>
> $leader_pos_7  = substr($leader,7,1);
>
>
>
> $field = $record->field('008');
>
> $field_008 = $field->as_string();
>
> $field_008_position_23 = substr($field_008,23,1);
>
>
>

Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Stefano Bargioni
If I'm not wrong, 
$batch->strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano

On 29/mag/2014, at 23.13, John E Guillory wrote:

> Thanks Timothy for your help.
>  
> When processing about 5 million records I would expect some crazy records. 
> The new script (incorporating Timothy’s  suggestions) exited prematurely on 
> record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 
> 260”. I know 260 is publication stuff but it’s not “required”.  I’m 
> deliberately printing warnings but again the script exited prematurely.
>  
> Thanks for assistance.
> John
>  
>  
>  
>  
>  
>  
> From: Timothy Prettyman [mailto:timo...@umich.edu] 
> Sent: Thursday, May 29, 2014 11:23 AM
> To: John E Guillory
> Cc: perl4lib@perl.org
> Subject: Re: sending marc records into a script that uses MARC::Batch
>  
> For your first question, instead of:
>  
>  $batch = MARC::Batch->new(‘USMARC’,);
>  
> use:
>  
>  $batch = MARC::Batch->new(‘USMARC’,STDIN);
>  
> For your second, the error is likely caused when a field you're using 
> as_string() on doesn't exist in the record.  
>  
> So, you could do something like the following:
>  
> $field = $record->field('008');
> $field or do {  # check for existence 
> of field
>print "no 008 field for record\n";# no field
>next;  # skip the field 
> (or whatever)
> };
> $field_008 = $field->as_string();
>  
> Hope this helps
>  
> -Tim
>  
> Timothy Prettyman
> LIT/Library Systems
> University of Michigan
>  
> 
> On Thu, May 29, 2014 at 12:08 PM, John E Guillory  wrote:
> Hello,
> Two questions please:
>  
> 1.  I’ve written a script that opens a marc file for reading using this 
> syntax:
> 
>  
> $file = $ARGV[0];
> $batch = MARC::Batch->new('USMARC',$file);
>  
> It then loops thru the records using this syntax:
> while ( $record = $batch->next()) {
>  …..check position 6, 7 of leader and position 23 of 008 and make 
> some changes
> }
>  
> This works great. However, instead of accessing the file this way, I want to 
> pipe the output of a previously run marc dump command directly into this 
> script via the pipe.  
> I understand that this can be done using this syntax:while ($line 
> =){ …}, but I don’t understand how to use that STDIN with 
> “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = 
> MARC::Batch->new(‘USMARC’,);
>  
> 2.  My current script successfully reads and processes a marc file of 
> over 5 gigs!but exits entirely on record 160,585 with the error from 
> MARC::Batch, “Can't call method "as_string" on an undefined value at 
> ./marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it to 
> continue processing even when errors are encountered one should use 
> strict_off(), then print/report warnings at the bottom of the script. I don’t 
> think my particular error is being handled by the strict_off() setting. 
> Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” 
> error? Full script below—it’s pretty short, thanks to MARC::Batch.
> 
>  
> Thanks for ensights! 
>  
>  
> use MARC::Batch;
>  
> $file = $ARGV[0];
> chomp($file);
>  
> $batch = MARC::Batch->new('USMARC',$file);
> $batch->strict_off();# otherwise script exits when encounters errors
>  
> open(OUT,'>new_marc');
>  
> while ( $record = $batch->next()) {
> $leader= $record->leader();
> $leader_pos_6  = substr($leader,6,1);
> $leader_pos_7  = substr($leader,7,1);
>  
> $field = $record->field('008');
> $field_008 = $field->as_string();
> $field_008_position_23 = substr($field_008,23,1);
>  
> if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && 
> ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {
>  
>$control_num= $record->field('001');
>$control_num= $control_num->as_string();
>  
>print "008 position 23: $field_008_position_23 \n";
>print "OLD leader: $leader \n";
>$old_leader = $leader;
>substr($leader,6,1) = 'm';
>print "NEW leader: $leader \n";
>  
>print OUT $record->as_usmarc();
>   print "$control_

Re: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread Robin Sheat
John E Guillory schreef op do 29-05-2014 om 21:13 [+]:
> “Warnings detected: Entirely empty subfield found in tag 260”

An entirely empty subfield is an illegally formatted thing, at least
according to the rules of MARC::Record/MARC::Field, and so I assume the
MARC format itself. So it's not that it's a required field or anything
like that, it's that the USMARC is incorrectly formatted, so the parser
throws an exception with 'die'.

To catch the exception rather than having your program terminate, you
need to wrap the call that's failing in an 'eval' block, and check for
errors after it, handling them appropriately. You might be lucky and the
file is OK and the parser can continue, however you might be unlucky and
this corrupt record causes the parser to get confused and it can't find
the start of the next record.

See 'perldoc -f eval' for more information on using it for
error/exception handling.

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part


RE: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread John E Guillory
Thanks Timothy for your help.

When processing about 5 million records I would expect some crazy records. The 
new script (incorporating Timothy’s  suggestions) exited prematurely on record 
85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I 
know 260 is publication stuff but it’s not “required”.  I’m deliberately 
printing warnings but again the script exited prematurely.

Thanks for assistance.
John






From: Timothy Prettyman [mailto:timo...@umich.edu]
Sent: Thursday, May 29, 2014 11:23 AM
To: John E Guillory
Cc: perl4lib@perl.org
Subject: Re: sending marc records into a script that uses MARC::Batch

For your first question, instead of:

 $batch = MARC::Batch->new(‘USMARC’,);

use:

 $batch = MARC::Batch->new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using 
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record->field('008');
$field or do {  # check for existence 
of field
   print "no 008 field for record\n";# no field
   next;  # skip the field (or 
whatever)
};
$field_008 = $field->as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan

On Thu, May 29, 2014 at 12:08 PM, John E Guillory 
mailto:jo...@lsu.edu>> wrote:
Hello,
Two questions please:


1.  I’ve written a script that opens a marc file for reading using this 
syntax:

$file = $ARGV[0];
$batch = MARC::Batch->new('USMARC',$file);

It then loops thru the records using this syntax:
while ( $record = $batch->next()) {
 …..check position 6, 7 of leader and position 23 of 008 and make some 
changes
}

This works great. However, instead of accessing the file this way, I want to 
pipe the output of a previously run marc dump command directly into this script 
via the pipe.
I understand that this can be done using this syntax:while ($line 
=){ …}, but I don’t understand how to use that STDIN with 
“MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch = 
MARC::Batch->new(‘USMARC’,);


2.  My current script successfully reads and processes a marc file of over 
5 gigs!but exits entirely on record 160,585 with the error from 
MARC::Batch, “Can't call method "as_string" on an undefined value at 
./marc_batch.pl<http://marc_batch.pl>”.  Documentation on using MARC::Batch 
says that to tell it to continue processing even when errors are encountered 
one should use strict_off(), then print/report warnings at the bottom of the 
script. I don’t think my particular error is being handled by the strict_off() 
setting. Doesn’t anybody know what causes/how to fix “Can’t call method 
as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch.

Thanks for ensights!


use MARC::Batch;

$file = $ARGV[0];
chomp($file);

$batch = MARC::Batch->new('USMARC',$file);
$batch->strict_off();# otherwise script exits when encounters errors

open(OUT,'>new_marc');

while ( $record = $batch->next()) {
$leader= $record->leader();
$leader_pos_6  = substr($leader,6,1);
$leader_pos_7  = substr($leader,7,1);

$field = $record->field('008');
$field_008 = $field->as_string();
$field_008_position_23 = substr($field_008,23,1);

if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && 
($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {

   $control_num= $record->field('001');
   $control_num= $control_num->as_string();

   print "008 position 23: $field_008_position_23 \n";
   print "OLD leader: $leader \n";
   $old_leader = $leader;
   substr($leader,6,1) = 'm';
   print "NEW leader: $leader \n";

   print OUT $record->as_usmarc();
  print "$control_num|$old_leader|$leader|$field_008\n";

} else {  # not a match so just print this one unchanged…
   print OUT $record->as_usmarc();
}

}

# handles errors:
if (@warnings = $batch->warnings()) {
 print "\n Warnings detected: \n", @warnings;
}

close(OUT);
close(LOG);



John Guillory
Louisiana Library Network
225.578.3758




Re: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread Timothy Prettyman
For your first question, instead of:

 $batch = MARC::Batch->new(‘USMARC’,);

use:

 $batch = MARC::Batch->new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record->field('008');
$field or do {  # check for
existence of field
   print "no 008 field for record\n";# no field
   next;  # skip the field
(or whatever)
};

$field_008 = $field->as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan


On Thu, May 29, 2014 at 12:08 PM, John E Guillory  wrote:

>  Hello,
>
> Two questions please:
>
>
>
> 1.  I’ve written a script that opens a marc file for reading using
> this syntax:
>
>
>
> $file = $ARGV[0];
>
> $batch = MARC::Batch->new('USMARC',$file);
>
>
>
> It then loops thru the records using this syntax:
>
> while ( $record = $batch->next()) {
>
>  …..check position 6, 7 of leader and position 23 of 008 and make
> some changes
>
> }
>
>
>
> This works great. However, instead of accessing the file this way, I want
> to pipe the output of a previously run marc dump command directly into this
> script via the pipe.
>
> I understand that this can be done using this syntax:while ($line
> =){ …}, but I don’t understand how to use that STDIN with
> “MARC::Batch->new(‘USMARC’,$file);”This does not work:$batch =
> MARC::Batch->new(‘USMARC’,);
>
>
>
> 2.  My current script successfully reads and processes a marc file of
> over 5 gigs!but exits entirely on record 160,585 with the error from
> MARC::Batch, “Can't call method "as_string" on an undefined value at ./
> marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it
> to continue processing even when errors are encountered one should use
> strict_off(), then print/report warnings at the bottom of the script. I
> don’t think my particular error is being handled by the strict_off()
> setting. Doesn’t anybody know what causes/how to fix “Can’t call method
> as_string?” error? Full script below—it’s pretty short, thanks to
> MARC::Batch.
>
>
>
> Thanks for ensights!
>
>
>
>
>
> use MARC::Batch;
>
>
>
> $file = $ARGV[0];
>
> chomp($file);
>
>
>
> $batch = MARC::Batch->new('USMARC',$file);
>
> $batch->strict_off();# otherwise script exits when encounters errors
>
>
>
> open(OUT,'>new_marc');
>
>
>
> while ( $record = $batch->next()) {
>
> $leader= $record->leader();
>
> $leader_pos_6  = substr($leader,6,1);
>
> $leader_pos_7  = substr($leader,7,1);
>
>
>
> $field = $record->field('008');
>
> $field_008 = $field->as_string();
>
> $field_008_position_23 = substr($field_008,23,1);
>
>
>
> if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") &&
> ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {
>
>
>
>$control_num= $record->field('001');
>
>$control_num= $control_num->as_string();
>
>
>
>print "008 position 23: $field_008_position_23 \n";
>
>print "OLD leader: $leader \n";
>
>$old_leader = $leader;
>
>substr($leader,6,1) = 'm';
>
>print "NEW leader: $leader \n";
>
>
>
>print OUT $record->as_usmarc();
>
>   print "$control_num|$old_leader|$leader|$field_008\n";
>
>
>
> } else {  # not a match so just print this one unchanged…
>
>print OUT $record->as_usmarc();
>
> }
>
>
>
> }
>
>
>
> # handles errors:
>
> if (@warnings = $batch->warnings()) {
>
>  print "\n Warnings detected: \n", @warnings;
>
> }
>
>
>
> close(OUT);
>
> close(LOG);
>
>
>
>
>
>
>
> John Guillory
>
> Louisiana Library Network
>
> 225.578.3758
>
>
>