Re: [CODE4LIB] MARCXML help again

Kevin S . Clarke Tue, 10 Jan 2017 15:34:42 -0800

Yes, the YMMV was intentional because people do do it and we all have our own 
paths to follow, but there are a lot of "if"s that need to be in place (a few 
of which you mention) to make it a painless process.


Kevin

 
 
-----Original message-----
> From:Roy Tennant
> Sent: Tuesday, January 10 2017, 6:07 pm
> To: [email protected]
> Subject: Re: [CODE4LIB] MARCXML help again
> 
> Well, I think that's a *bit* harsh. But the "YMMV" addition was appreciated, 
> because it can and will. That is, if the XML is completely consistent, then 
> Kyle's suggestion is an excellent one. If it isn't, then Kevin's link 
> applies, IMHO. Since it appears from what we have been told that the records 
> are consistent, I think Kyle's solution is not only workable but the most 
> efficient. Given the caveat stated above.
> Roy
> 
> > On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[email protected]> wrote:
> > 
> > On the mention of parsing XML with string operations, I'm compelled to post 
> > one of my favorite StackOverflow responses:
> > 
> > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
> > 
> > YMMV of course...
> > 
> > Kevin 
> > 
> > 
> > 
> > -----Original message-----
> >> From:Kyle Banerjee
> >> Sent: Tuesday, January 10 2017, 5:44 pm
> >> To: [email protected]
> >> Subject: Re: [CODE4LIB] MARCXML help again
> >> 
> >> Howdy Julie,
> >> 
> >> Depending on your specific needs, it's often easier/faster to use string
> >> rather than XML operations to work with XML.
> >> 
> >> Especially if you have a large number of files and/or the files are very
> >> big, stripping the whitespace between elements and then performing a simple
> >> string substitution would be a fast low tech way to remove the unwanted
> >> fields.
> >> 
> >> kyle
> >> 
> >> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[email protected]>
> >> wrote:
> >> 
> >>> Thanks to all who responded to my earlier plea for help.  I now have a new
> >>> problem.  I'm not sure if I can do this with find and replace in Oxygen, 
> >>> or
> >>> if this requires XSLT, or what.
> >>> 
> >>> I have a project of MARCXML records like this:
> >>> 
> >>> <?xml version="1.0" encoding="UTF-8" ?>
> >>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim";
> >>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> >>>    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> >>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd";>
> >>>  <marc:record>
> >>> <!--Lots of other datafields here -->
> >>>    <marc:datafield tag="710" ind1="2" ind2=" ">
> >>>            <marc:subfield code="a">Faux College</marc:subfield>
> >>>            <marc:subfield code="b">Special Collections</marc:subfield>
> >>>        </marc:datafield>
> >>>  </marc:record>
> >>> </marc:collection>
> >>> 
> >>> I want to strip out all instances of:
> >>>    <marc:datafield tag="710" ind1="2" ind2=" ">
> >>>            <marc:subfield code="a">Faux College</marc:subfield>
> >>>            <marc:subfield code="b">Special Collections</marc:subfield>
> >>>        </marc:datafield>
> >>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
> >>> instances intact.  I only want to delete ones with both the Faux College
> >>> and Special Collections text in the subfields.
> >>> 
> >>> Where would I go from here? I thought of doing an xsl:template match in an
> >>> XSL stylesheet, and then not providing any instructions for replacing the
> >>> match, but I don't know how to select for that specific text. My attempts
> >>> to figure that out have not worked. You can only read so much W3C
> >>> documentation and Stack Overflow before you need to just sit quietly and
> >>> stare at a wall for a while.
> >>> 
> >>> Thanks in advance --
> >>> 
> >>> Julie
> >>> 
> >> 
> Yes

Re: [CODE4LIB] MARCXML help again

Reply via email to