RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-09 Thread Patrick Hochstenbach
Hi Robin

Sure

 join_field(subject.*, ); 
 join_field(subject,br);

The first join is for concatenating all the subfields. The second join is for 
all the field.

In the new Catmandu version we are enhancing the language a bit, thats why I 
might have written my previous examples with the new syntax.

Greetings from ELAG2014 in Bath!

Patrick

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Monday, June 09, 2014 4:58 AM
To: perl4lib
Subject: Re: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]:
 $ cat test.fixes
 marc_map('650','subject',join:'###');
 remove_field('record');

Ah, I found that I need to change the syntax a bit:

marc_map('650','subject', -split = 1);

gives me:

{subject:[[Counting,Pictorial works,Juvenile
literature.],[English language,Alphabet,Pictorial works,Juvenile
literature.,14467],[Time,Pictorial works,Juvenile
literature.,15531],[Children's stories, English,Pictorial
works.]],_id:5567128}

which is closer. Is there an easy way to flatten those arrays?

Otherwise I can go with join and the split, but this way seems cleaner.

Actually, I wonder if nested arrays would work even better for my
purposes, I guess I should test it...

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF

Re: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-09 Thread Robin Sheat
Patrick Hochstenbach schreef op di 10-06-2014 om 07:08 [+0200]:
 Sure
 
  join_field(subject.*, ); 
  join_field(subject,br);
 
 The first join is for concatenating all the subfields. The second join
 is for all the field.

Thanks.

I actually found out that Elasticsearch is totally happy with nested
arrays, and they're causing no problems at all like that, so I've just
left it as it is and it's working great.

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part


Re: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-08 Thread Robin Sheat
Patrick Hochstenbach schreef op vr 06-06-2014 om 06:53 [+0200]:
 By default all repeated subfields get joined by empty space, you can
 set this with the 'join' option:
 
 marc_map('650v','subject',join:'%%%')

This doesn't work:

$ cat test.fixes
marc_map('650','subject',join:'###');
remove_field('record');

(the remove is just to make the results easier to see.)

In the MARC record I'm experimenting with:

650  0 _aCounting
   _vPictorial works
   _vJuvenile literature.
650  0 _aEnglish language
   _xAlphabet
   _vPictorial works
   _vJuvenile literature.
   _914467
650  0 _aTime
   _vPictorial works
   _vJuvenile literature.
   _915531
650  0 _aChildren's stories, English
   _vPictorial works.

$ catmandu convert MARC --fix test.fixes  test.marc 
can't load fix marc_map('650','subject',join:'###');
remove_field('record');
: Not enough arguments for join or string at (eval 85) line 1, near join:
syntax error at (eval 85) line 1, near join:

Followed by a trace. The same goes when I attempt to use split:1, and
pretty much anything after the two parameters.

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part


Re: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-08 Thread Robin Sheat
Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]:
 $ cat test.fixes
 marc_map('650','subject',join:'###');
 remove_field('record');

Ah, I found that I need to change the syntax a bit:

marc_map('650','subject', -split = 1); 

gives me:

{subject:[[Counting,Pictorial works,Juvenile
literature.],[English language,Alphabet,Pictorial works,Juvenile
literature.,14467],[Time,Pictorial works,Juvenile
literature.,15531],[Children's stories, English,Pictorial
works.]],_id:5567128}

which is closer. Is there an easy way to flatten those arrays?

Otherwise I can go with join and the split, but this way seems cleaner.

Actually, I wonder if nested arrays would work even better for my
purposes, I guess I should test it...

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part


Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-05 Thread Robin Sheat
I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

[650, ,0,a,Time,v,Pictorial works,v,Juvenile
literature.,9,15531]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

subject:[Time,Pictorial worksJuvenile literature.]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF


signature.asc
Description: This is a digitally signed message part


RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-05 Thread Patrick Hochstenbach
Hi Robin

By default all repeated subfields get joined by empty space, you can set this 
with the 'join' option:

marc_map('650v','subject',join:'%%%')

gives you:

subject,Pictorial works%%%Juvenile

Or, if you have many 650 fields they are all joined into one string:

subject,Pictorial works%%%Juvenile%%%foo%%%bar%%%test

With the split_field command you can turn this again into an array:

split_field('subject','%%%')

gives you

subject,[Pictorial works,Juvenile,foo,bar,test]

Cheers
Patrick

PS. Indeed, the marc_map.pl is a bit cryptic. We are compiling perl scripts to 
make the executing much faster. The developers are now figuring out how to 
refactor this compilation out so that the Fix packages are easier to read.

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

[650, ,0,a,Time,v,Pictorial works,v,Juvenile
literature.,9,15531]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

subject:[Time,Pictorial worksJuvenile literature.]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF

RE: Converting MARC fields with Catmandu - repeated subfields being squished together.

2014-06-05 Thread Patrick Hochstenbach
Btw I've updates the Fixes cheat sheet at our Wiki to reflect your question :)

https://github.com/LibreCat/Catmandu/wiki/Fixes-Cheat-Sheet

From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being 
squished together.

I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

[650, ,0,a,Time,v,Pictorial works,v,Juvenile
literature.,9,15531]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

subject:[Time,Pictorial worksJuvenile literature.]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

--
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38  8505 77F5 B724 F871 3BDF