RE: sending marc records into a script that uses MARC::Batch

2014-06-03 Thread John E Guillory
Thanks for your ideas. I will try your suggestions.

John

From: Timothy Prettyman [mailto:timo...@umich.edu]
Sent: Friday, May 30, 2014 11:39 AM
To: perl4lib
Subject: Re: sending marc records into a script that uses MARC::Batch

I think you have to check for warnings as you read each record, so try moving 
your error handing code right after the batch-next() call.  But Robin's 
suggestion is good advice, and is probably a more robust way to handle the crud 
that can show up in a file of marc records.

-Tim

On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni 
bargi...@pusc.itmailto:bargi...@pusc.it wrote:
If I'm not wrong,
$batch-strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano

On 29/mag/2014, at 23.13, John E Guillory wrote:


Thanks Timothy for your help.

When processing about 5 million records I would expect some crazy records. The 
new script (incorporating Timothy’s  suggestions) exited prematurely on record 
85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I 
know 260 is publication stuff but it’s not “required”.  I’m deliberately 
printing warnings but again the script exited prematurely.

Thanks for assistance.
John






From: Timothy Prettyman [mailto:timo...@umich.edumailto:timo...@umich.edu]
Sent: Thursday, May 29, 2014 11:23 AM
To: John E Guillory
Cc: perl4lib@perl.orgmailto:perl4lib@perl.org
Subject: Re: sending marc records into a script that uses MARC::Batch

For your first question, instead of:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

use:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using 
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record-field('008');
$field or do {  # check for existence 
of field
   print no 008 field for record\n;# no field
   next;  # skip the field (or 
whatever)
};
$field_008 = $field-as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan

On Thu, May 29, 2014 at 12:08 PM, John E Guillory 
jo...@lsu.edumailto:jo...@lsu.edu wrote:
Hello,
Two questions please:


1.  I’ve written a script that opens a marc file for reading using this 
syntax:

$file = $ARGV[0];
$batch = MARC::Batch-new('USMARC',$file);

It then loops thru the records using this syntax:
while ( $record = $batch-next()) {
 …..check position 6, 7 of leader and position 23 of 008 and make some 
changes
}

This works great. However, instead of accessing the file this way, I want to 
pipe the output of a previously run marc dump command directly into this script 
via the pipe.
I understand that this can be done using this syntax:while ($line 
=STDIN){ …}, but I don’t understand how to use that STDIN with 
“MARC::Batch-new(‘USMARC’,$file);”This does not work:$batch = 
MARC::Batch-new(‘USMARC’,STDIN);


2.  My current script successfully reads and processes a marc file of over 
5 gigs!but exits entirely on record 160,585 with the error from 
MARC::Batch, “Can't call method as_string on an undefined value at 
./marc_batch.plhttp://marc_batch.pl/”.  Documentation on using MARC::Batch 
says that to tell it to continue processing even when errors are encountered 
one should use strict_off(), then print/report warnings at the bottom of the 
script. I don’t think my particular error is being handled by the strict_off() 
setting. Doesn’t anybody know what causes/how to fix “Can’t call method 
as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch.

Thanks for ensights!


use MARC::Batch;

$file = $ARGV[0];
chomp($file);

$batch = MARC::Batch-new('USMARC',$file);
$batch-strict_off();# otherwise script exits when encounters errors

open(OUT,'new_marc');

while ( $record = $batch-next()) {
$leader= $record-leader();
$leader_pos_6  = substr($leader,6,1);
$leader_pos_7  = substr($leader,7,1);

$field = $record-field('008');
$field_008 = $field-as_string();
$field_008_position_23 = substr($field_008,23,1);

if ( ($leader_pos_6 eq a)  ($leader_pos_7 eq m)  
($field_008_position_23 eq o) || ($field_008_position_23 eq s) ) {

   $control_num= $record-field('001');
   $control_num= $control_num-as_string();

   print 008 position 23: $field_008_position_23 \n;
   print OLD leader: $leader \n;
   $old_leader = $leader;
   substr($leader,6,1) = 'm';
   print NEW leader: $leader \n;

   print OUT $record-as_usmarc();
  print $control_num|$old_leader|$leader|$field_008\n;

} else {  # not a match so just print this one unchanged…
   print OUT $record-as_usmarc();
}

}

# handles errors:
if (@warnings = $batch-warnings()) {
 print \n Warnings detected: \n

Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Stefano Bargioni
If I'm not wrong, 
$batch-strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano

On 29/mag/2014, at 23.13, John E Guillory wrote:

 Thanks Timothy for your help.
  
 When processing about 5 million records I would expect some crazy records. 
 The new script (incorporating Timothy’s  suggestions) exited prematurely on 
 record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 
 260”. I know 260 is publication stuff but it’s not “required”.  I’m 
 deliberately printing warnings but again the script exited prematurely.
  
 Thanks for assistance.
 John
  
  
  
  
  
  
 From: Timothy Prettyman [mailto:timo...@umich.edu] 
 Sent: Thursday, May 29, 2014 11:23 AM
 To: John E Guillory
 Cc: perl4lib@perl.org
 Subject: Re: sending marc records into a script that uses MARC::Batch
  
 For your first question, instead of:
  
  $batch = MARC::Batch-new(‘USMARC’,STDIN);
  
 use:
  
  $batch = MARC::Batch-new(‘USMARC’,STDIN);
  
 For your second, the error is likely caused when a field you're using 
 as_string() on doesn't exist in the record.  
  
 So, you could do something like the following:
  
 $field = $record-field('008');
 $field or do {  # check for existence 
 of field
print no 008 field for record\n;# no field
next;  # skip the field 
 (or whatever)
 };
 $field_008 = $field-as_string();
  
 Hope this helps
  
 -Tim
  
 Timothy Prettyman
 LIT/Library Systems
 University of Michigan
  
 
 On Thu, May 29, 2014 at 12:08 PM, John E Guillory jo...@lsu.edu wrote:
 Hello,
 Two questions please:
  
 1.  I’ve written a script that opens a marc file for reading using this 
 syntax:
 
  
 $file = $ARGV[0];
 $batch = MARC::Batch-new('USMARC',$file);
  
 It then loops thru the records using this syntax:
 while ( $record = $batch-next()) {
  …..check position 6, 7 of leader and position 23 of 008 and make 
 some changes
 }
  
 This works great. However, instead of accessing the file this way, I want to 
 pipe the output of a previously run marc dump command directly into this 
 script via the pipe.  
 I understand that this can be done using this syntax:while ($line 
 =STDIN){ …}, but I don’t understand how to use that STDIN with 
 “MARC::Batch-new(‘USMARC’,$file);”This does not work:$batch = 
 MARC::Batch-new(‘USMARC’,STDIN);
  
 2.  My current script successfully reads and processes a marc file of 
 over 5 gigs!but exits entirely on record 160,585 with the error from 
 MARC::Batch, “Can't call method as_string on an undefined value at 
 ./marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it to 
 continue processing even when errors are encountered one should use 
 strict_off(), then print/report warnings at the bottom of the script. I don’t 
 think my particular error is being handled by the strict_off() setting. 
 Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” 
 error? Full script below—it’s pretty short, thanks to MARC::Batch.
 
  
 Thanks for ensights! 
  
  
 use MARC::Batch;
  
 $file = $ARGV[0];
 chomp($file);
  
 $batch = MARC::Batch-new('USMARC',$file);
 $batch-strict_off();# otherwise script exits when encounters errors
  
 open(OUT,'new_marc');
  
 while ( $record = $batch-next()) {
 $leader= $record-leader();
 $leader_pos_6  = substr($leader,6,1);
 $leader_pos_7  = substr($leader,7,1);
  
 $field = $record-field('008');
 $field_008 = $field-as_string();
 $field_008_position_23 = substr($field_008,23,1);
  
 if ( ($leader_pos_6 eq a)  ($leader_pos_7 eq m)  
 ($field_008_position_23 eq o) || ($field_008_position_23 eq s) ) {
  
$control_num= $record-field('001');
$control_num= $control_num-as_string();
  
print 008 position 23: $field_008_position_23 \n;
print OLD leader: $leader \n;
$old_leader = $leader;
substr($leader,6,1) = 'm';
print NEW leader: $leader \n;
  
print OUT $record-as_usmarc();
   print $control_num|$old_leader|$leader|$field_008\n;
   
 } else {  # not a match so just print this one unchanged…
print OUT $record-as_usmarc();
 }
  
 }
  
 # handles errors:
 if (@warnings = $batch-warnings()) {
  print \n Warnings detected: \n, @warnings;
 }
  
 close(OUT);
 close(LOG);
  
  
  
 John Guillory
 Louisiana Library Network
 225.578.3758
  
  



__
Il tuo 5x1000 al Patronato di San Girolamo della Carità è un gesto semplice ma 
di grande valore.
Una tua firma aiuterà i sacerdoti ad essere più vicini alle esigenze di tutti 
noi.
Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti 
indicando nella dichiarazione dei redditi il codice fiscale 97023980580.


Re: sending marc records into a script that uses MARC::Batch

2014-05-30 Thread Timothy Prettyman
I think you have to check for warnings as you read each record, so try
moving your error handing code right after the batch-next() call.  But
Robin's suggestion is good advice, and is probably a more robust way to
handle the crud that can show up in a file of marc records.

-Tim


On Fri, May 30, 2014 at 5:20 AM, Stefano Bargioni bargi...@pusc.it wrote:

 If I'm not wrong,
 $batch-strict_off();
 will avoid your loop to print warnings and stop processing records.
 HTH. Stefano

 On 29/mag/2014, at 23.13, John E Guillory wrote:

  Thanks Timothy for your help.



 When processing about 5 million records I would expect some crazy records.
 The new script (incorporating Timothy’s  suggestions) exited prematurely on
 record 85,877 with: “Warnings detected: Entirely empty subfield found in
 tag 260”. I know 260 is publication stuff but it’s not “required”.  I’m
 deliberately printing warnings but again the script exited prematurely.



 Thanks for assistance.

 John













 *From:* Timothy Prettyman [mailto:timo...@umich.edu]
 *Sent:* Thursday, May 29, 2014 11:23 AM
 *To:* John E Guillory
 *Cc:* perl4lib@perl.org
 *Subject:* Re: sending marc records into a script that uses MARC::Batch



 For your first question, instead of:



  $batch = MARC::Batch-new(‘USMARC’,STDIN);



 use:



  $batch = MARC::Batch-new(‘USMARC’,STDIN);



 For your second, the error is likely caused when a field you're using
 as_string() on doesn't exist in the record.



 So, you could do something like the following:



 $field = $record-field('008');

 $field or do {  # check for
 existence of field

print no 008 field for record\n;# no field

next;  # skip the field
 (or whatever)

 };

 $field_008 = $field-as_string();



 Hope this helps



 -Tim



 Timothy Prettyman

 LIT/Library Systems

 University of Michigan



 On Thu, May 29, 2014 at 12:08 PM, John E Guillory jo...@lsu.edu wrote:

 Hello,

 Two questions please:



 1.  I’ve written a script that opens a marc file for reading using
 this syntax:



 $file = $ARGV[0];

 $batch = MARC::Batch-new('USMARC',$file);



 It then loops thru the records using this syntax:

 while ( $record = $batch-next()) {

  …..check position 6, 7 of leader and position 23 of 008 and make
 some changes

 }



 This works great. However, instead of accessing the file this way, I want
 to pipe the output of a previously run marc dump command directly into this
 script via the pipe.

 I understand that this can be done using this syntax:while ($line
 =STDIN){ …}, but I don’t understand how to use that STDIN with
 “MARC::Batch-new(‘USMARC’,$file);”This does not work:$batch =
 MARC::Batch-new(‘USMARC’,STDIN);



 2.  My current script successfully reads and processes a marc file of
 over 5 gigs!but exits entirely on record 160,585 with the error from
 MARC::Batch, “Can't call method as_string on an undefined value at ./
 marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it
 to continue processing even when errors are encountered one should use
 strict_off(), then print/report warnings at the bottom of the script. I
 don’t think my particular error is being handled by the strict_off()
 setting. Doesn’t anybody know what causes/how to fix “Can’t call method
 as_string?” error? Full script below—it’s pretty short, thanks to
 MARC::Batch.



 Thanks for ensights!





 use MARC::Batch;



 $file = $ARGV[0];

 chomp($file);



 $batch = MARC::Batch-new('USMARC',$file);

 $batch-strict_off();# otherwise script exits when encounters errors



 open(OUT,'new_marc');



 while ( $record = $batch-next()) {

 $leader= $record-leader();

 $leader_pos_6  = substr($leader,6,1);

 $leader_pos_7  = substr($leader,7,1);



 $field = $record-field('008');

 $field_008 = $field-as_string();

 $field_008_position_23 = substr($field_008,23,1);



 if ( ($leader_pos_6 eq a)  ($leader_pos_7 eq m) 
 ($field_008_position_23 eq o) || ($field_008_position_23 eq s) ) {



$control_num= $record-field('001');

$control_num= $control_num-as_string();



print 008 position 23: $field_008_position_23 \n;

print OLD leader: $leader \n;

$old_leader = $leader;

substr($leader,6,1) = 'm';

print NEW leader: $leader \n;



print OUT $record-as_usmarc();

   print $control_num|$old_leader|$leader|$field_008\n;



 } else {  # not a match so just print this one unchanged…

print OUT $record-as_usmarc();

 }



 }



 # handles errors:

 if (@warnings = $batch-warnings()) {

  print \n Warnings detected: \n, @warnings;

 }



 close(OUT);

 close(LOG);







 John Guillory

 Louisiana Library Network

 225.578.3758

Re: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread Timothy Prettyman
For your first question, instead of:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

use:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record-field('008');
$field or do {  # check for
existence of field
   print no 008 field for record\n;# no field
   next;  # skip the field
(or whatever)
};

$field_008 = $field-as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan


On Thu, May 29, 2014 at 12:08 PM, John E Guillory jo...@lsu.edu wrote:

  Hello,

 Two questions please:



 1.  I’ve written a script that opens a marc file for reading using
 this syntax:



 $file = $ARGV[0];

 $batch = MARC::Batch-new('USMARC',$file);



 It then loops thru the records using this syntax:

 while ( $record = $batch-next()) {

  …..check position 6, 7 of leader and position 23 of 008 and make
 some changes

 }



 This works great. However, instead of accessing the file this way, I want
 to pipe the output of a previously run marc dump command directly into this
 script via the pipe.

 I understand that this can be done using this syntax:while ($line
 =STDIN){ …}, but I don’t understand how to use that STDIN with
 “MARC::Batch-new(‘USMARC’,$file);”This does not work:$batch =
 MARC::Batch-new(‘USMARC’,STDIN);



 2.  My current script successfully reads and processes a marc file of
 over 5 gigs!but exits entirely on record 160,585 with the error from
 MARC::Batch, “Can't call method as_string on an undefined value at ./
 marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it
 to continue processing even when errors are encountered one should use
 strict_off(), then print/report warnings at the bottom of the script. I
 don’t think my particular error is being handled by the strict_off()
 setting. Doesn’t anybody know what causes/how to fix “Can’t call method
 as_string?” error? Full script below—it’s pretty short, thanks to
 MARC::Batch.



 Thanks for ensights!





 use MARC::Batch;



 $file = $ARGV[0];

 chomp($file);



 $batch = MARC::Batch-new('USMARC',$file);

 $batch-strict_off();# otherwise script exits when encounters errors



 open(OUT,'new_marc');



 while ( $record = $batch-next()) {

 $leader= $record-leader();

 $leader_pos_6  = substr($leader,6,1);

 $leader_pos_7  = substr($leader,7,1);



 $field = $record-field('008');

 $field_008 = $field-as_string();

 $field_008_position_23 = substr($field_008,23,1);



 if ( ($leader_pos_6 eq a)  ($leader_pos_7 eq m) 
 ($field_008_position_23 eq o) || ($field_008_position_23 eq s) ) {



$control_num= $record-field('001');

$control_num= $control_num-as_string();



print 008 position 23: $field_008_position_23 \n;

print OLD leader: $leader \n;

$old_leader = $leader;

substr($leader,6,1) = 'm';

print NEW leader: $leader \n;



print OUT $record-as_usmarc();

   print $control_num|$old_leader|$leader|$field_008\n;



 } else {  # not a match so just print this one unchanged…

print OUT $record-as_usmarc();

 }



 }



 # handles errors:

 if (@warnings = $batch-warnings()) {

  print \n Warnings detected: \n, @warnings;

 }



 close(OUT);

 close(LOG);







 John Guillory

 Louisiana Library Network

 225.578.3758





RE: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread John E Guillory
Thanks Timothy for your help.

When processing about 5 million records I would expect some crazy records. The 
new script (incorporating Timothy’s  suggestions) exited prematurely on record 
85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I 
know 260 is publication stuff but it’s not “required”.  I’m deliberately 
printing warnings but again the script exited prematurely.

Thanks for assistance.
John






From: Timothy Prettyman [mailto:timo...@umich.edu]
Sent: Thursday, May 29, 2014 11:23 AM
To: John E Guillory
Cc: perl4lib@perl.org
Subject: Re: sending marc records into a script that uses MARC::Batch

For your first question, instead of:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

use:

 $batch = MARC::Batch-new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using 
as_string() on doesn't exist in the record.

So, you could do something like the following:

$field = $record-field('008');
$field or do {  # check for existence 
of field
   print no 008 field for record\n;# no field
   next;  # skip the field (or 
whatever)
};
$field_008 = $field-as_string();

Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan

On Thu, May 29, 2014 at 12:08 PM, John E Guillory 
jo...@lsu.edumailto:jo...@lsu.edu wrote:
Hello,
Two questions please:


1.  I’ve written a script that opens a marc file for reading using this 
syntax:

$file = $ARGV[0];
$batch = MARC::Batch-new('USMARC',$file);

It then loops thru the records using this syntax:
while ( $record = $batch-next()) {
 …..check position 6, 7 of leader and position 23 of 008 and make some 
changes
}

This works great. However, instead of accessing the file this way, I want to 
pipe the output of a previously run marc dump command directly into this script 
via the pipe.
I understand that this can be done using this syntax:while ($line 
=STDIN){ …}, but I don’t understand how to use that STDIN with 
“MARC::Batch-new(‘USMARC’,$file);”This does not work:$batch = 
MARC::Batch-new(‘USMARC’,STDIN);


2.  My current script successfully reads and processes a marc file of over 
5 gigs!but exits entirely on record 160,585 with the error from 
MARC::Batch, “Can't call method as_string on an undefined value at 
./marc_batch.plhttp://marc_batch.pl”.  Documentation on using MARC::Batch 
says that to tell it to continue processing even when errors are encountered 
one should use strict_off(), then print/report warnings at the bottom of the 
script. I don’t think my particular error is being handled by the strict_off() 
setting. Doesn’t anybody know what causes/how to fix “Can’t call method 
as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch.

Thanks for ensights!


use MARC::Batch;

$file = $ARGV[0];
chomp($file);

$batch = MARC::Batch-new('USMARC',$file);
$batch-strict_off();# otherwise script exits when encounters errors

open(OUT,'new_marc');

while ( $record = $batch-next()) {
$leader= $record-leader();
$leader_pos_6  = substr($leader,6,1);
$leader_pos_7  = substr($leader,7,1);

$field = $record-field('008');
$field_008 = $field-as_string();
$field_008_position_23 = substr($field_008,23,1);

if ( ($leader_pos_6 eq a)  ($leader_pos_7 eq m)  
($field_008_position_23 eq o) || ($field_008_position_23 eq s) ) {

   $control_num= $record-field('001');
   $control_num= $control_num-as_string();

   print 008 position 23: $field_008_position_23 \n;
   print OLD leader: $leader \n;
   $old_leader = $leader;
   substr($leader,6,1) = 'm';
   print NEW leader: $leader \n;

   print OUT $record-as_usmarc();
  print $control_num|$old_leader|$leader|$field_008\n;

} else {  # not a match so just print this one unchanged…
   print OUT $record-as_usmarc();
}

}

# handles errors:
if (@warnings = $batch-warnings()) {
 print \n Warnings detected: \n, @warnings;
}

close(OUT);
close(LOG);



John Guillory
Louisiana Library Network
225.578.3758tel:225.578.3758




Re: sending marc records into a script that uses MARC::Batch

2014-05-29 Thread Robin Sheat
John E Guillory schreef op do 29-05-2014 om 21:13 [+]:
 “Warnings detected: Entirely empty subfield found in tag 260”

An entirely empty subfield is an illegally formatted thing, at least
according to the rules of MARC::Record/MARC::Field, and so I assume the
MARC format itself. So it's not that it's a required field or anything
like that, it's that the USMARC is incorrectly formatted, so the parser
throws an exception with 'die'.

To catch the exception rather than having your program terminate, you
need to wrap the call that's failing in an 'eval' block, and check for
errors after it, handling them appropriately. You might be lucky and the
file is OK and the parser can continue, however you might be unlucky and
this corrupt record causes the parser to get confused and it can't find
the start of the next record.

See 'perldoc -f eval' for more information on using it for
error/exception handling.

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part