Jon Gorman wrote:
You could substitute XML with e.g. Base64 encoding if it makes thinking
about this stuff easier. For instance email clients often send binary files
in Base64, but it doesn't mean the file is ruined, as the receiving email
client can decode it back to the original binary.
A bit of an ironic statement, considering a regular, constant
complaint on several library-related mailing lists I'm on is that
emails are coming in "garbled" or need to be sent again in "plain
text". Without fail it's because the person is using a client that
won't or can't deal with Base64. Yes, silly this day and age.
Well, yeah, of course the receiving party could be broken, but there's
definitely all the information and rules it could use to decode the data.
Perhaps I'm just jaded from working into libraries for too long but
your examples assume some logical consistent control through the
process of dealing with MARC data.
Not really, just some consistency in working with XML.
while later they start raising alarms because of either: they see the
markup in the record and wonder what's happening and how to remove it
or one of those tools treats that area as text, another as xml
content, and somewhere along the way it gets messed up.
I failed to make the statement that I definitely don't support adding
HTML to MARC records (unless at least a clean and backwards-compatible
way to do that is devised). My point was just that the transport
mechanism is irrelevant.
database that was put there by the original MARC. The code is
horribly setup and hackish and certain fields do not bother to escape
what it's retrieving from the record. You then go to import to your
new ILS, which validates the MARCXML. It of course now croaks because
you have something like
<marc:subfield code="a"><div class="foo">pretty</div>.
You're screwed then. You don't need embedded HTML to break it. For
instance a simple < ("less than") character in a field would do. This is
an example of very broken XML creation. I've done it myself too, but
fixed it asap to avoid more embarrassment. All this, however, is not a
useful argument in the HTML in MARC records debate.
Would you count on having someone on the staff who will be able to fix
those MARCXML files? Or did you have someone like that and they
burned out? How long before the support contract on your old ILS
forces you to abandon it?
Ok, this is drifting off-topic, but if you have a way to get the
original MARC records in ISO2709 format (or any other non-broken
format), it's a very simple task to convert them to valid MARCXML.
I mainly sent out this email though because I don't think the folks
who have been pointing out issues are confused. It's not that we
don't understand that it should be able to "round-trip" or that we
haven't played around with html in other data formats. I think we've
used enough software in the library would to not trust all the layers
will work as they should.
And my response was to Jonathan simply trying to state that encoding the
stuff in XML doesn't make a difference. I agree with you that there are
interoperability issues, but there are also examples in the library
world where stuff just works even when XML is involved, and the
receiving party can get the data intact.
--Ere