Re: Converting MARC fields with Catmandu - repeated subfields being squished together.
Patrick Hochstenbach schreef op di 10-06-2014 om 07:08 [+0200]: > Sure > > join_field("subject.*"," "); > join_field("subject",""); > > The first join is for concatenating all the subfields. The second join > is for all the field. Thanks. I actually found out that Elasticsearch is totally happy with nested arrays, and they're causing no problems at all like that, so I've just left it as it is and it's working great. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF signature.asc Description: This is a digitally signed message part
RE: Converting MARC fields with Catmandu - repeated subfields being squished together.
Hi Robin Sure join_field("subject.*"," "); join_field("subject",""); The first join is for concatenating all the subfields. The second join is for all the field. In the new Catmandu version we are enhancing the language a bit, thats why I might have written my previous examples with the new syntax. Greetings from ELAG2014 in Bath! Patrick From: Robin Sheat [ro...@catalyst.net.nz] Sent: Monday, June 09, 2014 4:58 AM To: perl4lib Subject: Re: Converting MARC fields with Catmandu - repeated subfields being squished together. Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]: > $ cat test.fixes > marc_map('650','subject',join:'###'); > remove_field('record'); Ah, I found that I need to change the syntax a bit: marc_map('650','subject', -split => 1); gives me: {"subject":[["Counting","Pictorial works","Juvenile literature."],["English language","Alphabet","Pictorial works","Juvenile literature.","14467"],["Time","Pictorial works","Juvenile literature.","15531"],["Children's stories, English","Pictorial works."]],"_id":"5567128"} which is closer. Is there an easy way to flatten those arrays? Otherwise I can go with join and the split, but this way seems cleaner. Actually, I wonder if nested arrays would work even better for my purposes, I guess I should test it... -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
Re: Converting MARC fields with Catmandu - repeated subfields being squished together.
Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]: > $ cat test.fixes > marc_map('650','subject',join:'###'); > remove_field('record'); Ah, I found that I need to change the syntax a bit: marc_map('650','subject', -split => 1); gives me: {"subject":[["Counting","Pictorial works","Juvenile literature."],["English language","Alphabet","Pictorial works","Juvenile literature.","14467"],["Time","Pictorial works","Juvenile literature.","15531"],["Children's stories, English","Pictorial works."]],"_id":"5567128"} which is closer. Is there an easy way to flatten those arrays? Otherwise I can go with join and the split, but this way seems cleaner. Actually, I wonder if nested arrays would work even better for my purposes, I guess I should test it... -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF signature.asc Description: This is a digitally signed message part
Re: Converting MARC fields with Catmandu - repeated subfields being squished together.
Patrick Hochstenbach schreef op vr 06-06-2014 om 06:53 [+0200]: > By default all repeated subfields get joined by empty space, you can > set this with the 'join' option: > > marc_map('650v','subject',join:'%%%') This doesn't work: $ cat test.fixes marc_map('650','subject',join:'###'); remove_field('record'); (the remove is just to make the results easier to see.) In the MARC record I'm experimenting with: 650 0 _aCounting _vPictorial works _vJuvenile literature. 650 0 _aEnglish language _xAlphabet _vPictorial works _vJuvenile literature. _914467 650 0 _aTime _vPictorial works _vJuvenile literature. _915531 650 0 _aChildren's stories, English _vPictorial works. $ catmandu convert MARC --fix test.fixes < test.marc can't load fix marc_map('650','subject',join:'###'); remove_field('record'); : Not enough arguments for join or string at (eval 85) line 1, near "join:" syntax error at (eval 85) line 1, near "join:" Followed by a trace. The same goes when I attempt to use split:1, and pretty much anything after the two parameters. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF signature.asc Description: This is a digitally signed message part
RE: Converting MARC fields with Catmandu - repeated subfields being squished together.
Btw I've updates the Fixes cheat sheet at our Wiki to reflect your question :) https://github.com/LibreCat/Catmandu/wiki/Fixes-Cheat-Sheet From: Robin Sheat [ro...@catalyst.net.nz] Sent: Friday, June 06, 2014 5:11 AM To: perl4lib Subject: Converting MARC fields with Catmandu - repeated subfields being squished together. I'm using catmandu to JSON-ise MARC records for storage in Elasticsearch, and seem to have come up with something that I can't readily see how to fix (without getting down and dirty with fixers.) I have a record that has this: ["650"," ","0","a","Time","v","Pictorial works","v","Juvenile literature.","9","15531"] and a mapping: marc_map('650v', 'subject.$append') This works well enough in most cases, however when the subfield is doubled up, I end up with: "subject":["Time","Pictorial worksJuvenile literature."] The $append doesn't seem to apply in this case. This only seems to happen to repeats within a field, other 650$v subfields are in their own strings, though suffer the same problem. Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file, but the lack of internal documentation, and the nature of what it's doing make it not the easiest thing to understand. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
RE: Converting MARC fields with Catmandu - repeated subfields being squished together.
Hi Robin By default all repeated subfields get joined by empty space, you can set this with the 'join' option: marc_map('650v','subject',join:'%%%') gives you: "subject","Pictorial works%%%Juvenile" Or, if you have many 650 fields they are all joined into one string: "subject","Pictorial works%%%Juvenile%%%foo%%%bar%%%test" With the split_field command you can turn this again into an array: split_field('subject','%%%') gives you "subject",["Pictorial works","Juvenile","foo","bar","test"] Cheers Patrick PS. Indeed, the marc_map.pl is a bit cryptic. We are compiling perl scripts to make the executing much faster. The developers are now figuring out how to refactor this compilation out so that the Fix packages are easier to read. From: Robin Sheat [ro...@catalyst.net.nz] Sent: Friday, June 06, 2014 5:11 AM To: perl4lib Subject: Converting MARC fields with Catmandu - repeated subfields being squished together. I'm using catmandu to JSON-ise MARC records for storage in Elasticsearch, and seem to have come up with something that I can't readily see how to fix (without getting down and dirty with fixers.) I have a record that has this: ["650"," ","0","a","Time","v","Pictorial works","v","Juvenile literature.","9","15531"] and a mapping: marc_map('650v', 'subject.$append') This works well enough in most cases, however when the subfield is doubled up, I end up with: "subject":["Time","Pictorial worksJuvenile literature."] The $append doesn't seem to apply in this case. This only seems to happen to repeats within a field, other 650$v subfields are in their own strings, though suffer the same problem. Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file, but the lack of internal documentation, and the nature of what it's doing make it not the easiest thing to understand. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF