Re: Fwd: some transformations on file

2013-01-21 Thread samuel desseaux
Hi Paul,

yes, it's exactly  the way i try to follow.

I've my algorithm but it's a bit hard, (for the moment but i hope to have
more time to learn perl) to write a good code.


samuel


2013/1/21 Paul Hoffman nkui...@nkuitse.com

 On Sun, Jan 20, 2013 at 06:43:38PM +0100, samuel desseaux wrote:
  *the goal is to join properly items with biblio records.

 Let's assume that you have these two files:

 (B) Three MARC bibliographic records

 1. 001 = 1029
 2. 001 = 3884
 3. 001 = 1650
 (etc.)

 (I) Seven MARC item records

 1. 001 = 1029
 2. 001 = 1650
 3. 001 = 1029
 4. 001 = 3884
 5. 001 = 3884
 6. 001 = 1650
 7. 001 = 1650

 Do you want to produce a *new* file of three records, like this?

 1. I1 + I3
 2. I4 + I5
 3. I2 + I6 + I7

 Is this really what you want to have in the end?

  As we have to separate files, it's a bit hard. With MarcEdit, if i
  merge these two files, it's limited: marcedit doesn't understand that
  one biblio record can have more than one item :-).  I won't say any
  more about my library and his exotical old ils i've moved for koha.

 It sounds as though what you *really* want in the end is a *single* file
 of three MARC records, like this:

 B1 + I1 + I3
 B2 + I4 + I5
 B3 + I2 + I6 + I7

 Is that right?  Here's a rough start in Perl:


 8888888
 use MARC::File;
 my ($file, %records);
 $file = MARC::File::USMARC-in($bib_records_file);
 while (my $bib_marc = read_next_record_from($file) {
 my $sysnum = sysnum($bib_marc);
 $records{$sysnum} = [ $bib_marc ];
 }
 $file-close;
 $file = MARC::File::USMARC-in($bib_records_file);
 while (my $item_marc = read_next_record_from($file) {
 my $sysnum = sysnum($item_marc);
 push @{ $records{$sysnum} }, $item_marc;
 }
 $file-close;
 print @$_ for values %records;

 8888888

 Let us know if you need help writing read_next_record_from() or
 sysnum().

 Paul.

 --
 Paul Hoffman nkui...@nkuitse.com



Re: Fwd: some transformations on file

2013-01-20 Thread Marc Chantreux
hello,

Just 2 notes about your attached content:

* please don't do that on mailing list: it's unsolicited content.
  provide download urls instead.
* those are not marc files so the exemples given below don't work as
  long as you haven't translated it to iso2709.

On Sun, Jan 20, 2013 at 03:54:18PM +0100, samuel desseaux wrote:
 Hi,
 
 I work on files for our library and i need some help.
 
 I have one file with all biblio records and one with items. A biblio record
 can have one or more than one item.
 
 First operation: i want to compare the two files and the identifier is the
 field 001. I want to have th results in two separates files
 
 1st: all the items which have the same 001 field like in the biblio record
 
 2nd: all the items which have not the same 001 field like in the biblio
 record

not tested but here is a good base:
  
use Modern::Perl;
use autodie;
use MARC::MIR; 

my %biblio;
my %report;
map { open $report{$_},$_.matches.txt } qw do dont ;

marawk { $biblio{(record_id)}=1 } 'biblio.mrc';
marawk {
my $id = record_id;
my $as = $biblio_id{ $to } ? 'do' : 'dont';
say $report{$as}, $id;
} 'items.mrc';


 Second operation: In my item files, all items of  a same biblio record have
 the same 001 field but they are all separated. I'd like to join all the
 items under only one 001 field

a) be carefull: it will load the whole file in memory
b) not tested :)

use Modern::Perl;
use autodie;
use MARC::MIR; 

my %items_for;
marawk { push @ { $items_for{(record_id)} } , $_ } 'items.mrc';

open my $fh,'sorted.items.mrc';
map { map {print $fh to_iso2709} @$_ } values %items_for;

 After, with the new file, i want to merge with biblio record and if i find
 2 identical 001, i attached the items on the biblio record 

i don't get it. you want to merge item records and biblio record?

 Third operation: how can i correct some data bad encoded. It's due to the
 old database which doesn't respect UTF8.

i see no problem in the provided content.

regards
marc


Re: Fwd: some transformations on file

2013-01-20 Thread Marc Chantreux
hello, 

tu peux me redonner un lien vers les fichiers marc ? 

cordialement,
marc

-- 
Marc Chantreux
Université de Strasbourg, Direction Informatique
14 Rue René Descartes,
67084  STRASBOURG CEDEX
☎: 03.68.85.57.40
http://unistra.fr
Don't believe everything you read on the Internet
-- Abraham Lincoln


Re: Fwd: some transformations on file

2013-01-20 Thread Marc Chantreux
oops! sorry about it: bad destination


On Sun, Jan 20, 2013 at 06:14:20PM +0100, Marc Chantreux wrote:
 hello, 
 
 tu peux me redonner un lien vers les fichiers marc ? 
 
 cordialement,
 marc
 
 -- 
 Marc Chantreux
 Université de Strasbourg, Direction Informatique
 14 Rue René Descartes,
 67084  STRASBOURG CEDEX
 ☎: 03.68.85.57.40
 http://unistra.fr
 Don't believe everything you read on the Internet
 -- Abraham Lincoln

-- 
Marc Chantreux
Université de Strasbourg, Direction Informatique
14 Rue René Descartes,
67084  STRASBOURG CEDEX
☎: 03.68.85.57.40
http://unistra.fr
Don't believe everything you read on the Internet
-- Abraham Lincoln


Re: Fwd: some transformations on file

2013-01-20 Thread samuel desseaux
* if it's a better solution, i will put my files(converted in iso2709) on
dropbox,


*the goal is to join properly items with biblio records. As we have to
separate files, it's a bit hard. With MarcEdit, if i merge these two files,
it's limited: marcedit doesn't understand that one biblio record can have
more than one item :-). I won't say any more about my library and his
exotical old ils i've moved for koha.








2013/1/20 Marc Chantreux m...@unistra.fr

 hello,

 Just 2 notes about your attached content:

 * please don't do that on mailing list: it's unsolicited content.
   provide download urls instead.
 * those are not marc files so the exemples given below don't work as
   long as you haven't translated it to iso2709.

 On Sun, Jan 20, 2013 at 03:54:18PM +0100, samuel desseaux wrote:
  Hi,
 
  I work on files for our library and i need some help.
 
  I have one file with all biblio records and one with items. A biblio
 record
  can have one or more than one item.
 
  First operation: i want to compare the two files and the identifier is
 the
  field 001. I want to have th results in two separates files
 
  1st: all the items which have the same 001 field like in the biblio
 record
 
  2nd: all the items which have not the same 001 field like in the biblio
  record

 not tested but here is a good base:

 use Modern::Perl;
 use autodie;
 use MARC::MIR;

 my %biblio;
 my %report;
 map { open $report{$_},$_.matches.txt } qw do dont ;

 marawk { $biblio{(record_id)}=1 } 'biblio.mrc';
 marawk {
 my $id = record_id;
 my $as = $biblio_id{ $to } ? 'do' : 'dont';
 say $report{$as}, $id;
 } 'items.mrc';


  Second operation: In my item files, all items of  a same biblio record
 have
  the same 001 field but they are all separated. I'd like to join all the
  items under only one 001 field

 a) be carefull: it will load the whole file in memory
 b) not tested :)

 use Modern::Perl;
 use autodie;
 use MARC::MIR;

 my %items_for;
 marawk { push @ { $items_for{(record_id)} } , $_ } 'items.mrc';

 open my $fh,'sorted.items.mrc';
 map { map {print $fh to_iso2709} @$_ } values %items_for;

  After, with the new file, i want to merge with biblio record and if i
 find
  2 identical 001, i attached the items on the biblio record

 i don't get it. you want to merge item records and biblio record?

  Third operation: how can i correct some data bad encoded. It's due to the
  old database which doesn't respect UTF8.

 i see no problem in the provided content.

 regards
 marc



Re: Fwd: some transformations on file

2013-01-20 Thread Paul Hoffman
On Sun, Jan 20, 2013 at 06:43:38PM +0100, samuel desseaux wrote:
 *the goal is to join properly items with biblio records. 

Let's assume that you have these two files:

(B) Three MARC bibliographic records

1. 001 = 1029
2. 001 = 3884
3. 001 = 1650
(etc.)

(I) Seven MARC item records

1. 001 = 1029
2. 001 = 1650
3. 001 = 1029
4. 001 = 3884
5. 001 = 3884
6. 001 = 1650
7. 001 = 1650

Do you want to produce a *new* file of three records, like this?

1. I1 + I3
2. I4 + I5
3. I2 + I6 + I7

Is this really what you want to have in the end?

 As we have to separate files, it's a bit hard. With MarcEdit, if i 
 merge these two files, it's limited: marcedit doesn't understand that 
 one biblio record can have more than one item :-).  I won't say any 
 more about my library and his exotical old ils i've moved for koha.

It sounds as though what you *really* want in the end is a *single* file 
of three MARC records, like this:

B1 + I1 + I3
B2 + I4 + I5
B3 + I2 + I6 + I7

Is that right?  Here's a rough start in Perl:

8888888
use MARC::File;
my ($file, %records);
$file = MARC::File::USMARC-in($bib_records_file);
while (my $bib_marc = read_next_record_from($file) {
my $sysnum = sysnum($bib_marc);
$records{$sysnum} = [ $bib_marc ];
}
$file-close;
$file = MARC::File::USMARC-in($bib_records_file);
while (my $item_marc = read_next_record_from($file) {
my $sysnum = sysnum($item_marc);
push @{ $records{$sysnum} }, $item_marc;
}
$file-close;
print @$_ for values %records;
8888888

Let us know if you need help writing read_next_record_from() or 
sysnum().

Paul.

-- 
Paul Hoffman nkui...@nkuitse.com