I think you have hit the nail on the head - we need to be clear about the
difference between the content (displayed in the editor) and the
serialization of the content (to XML).
So yes, if you type "a<b" in the editor, it should correctly handle
translation between the displayed format and the xml serialisation. And
should correctly translate between a displayed "&" and serialised "&".
That still leaves open as to how characters such as C should be entered in
the admin client.
-----Original Message-----
From: Schwichtenberg, Frank [mailto:[email protected]]
Sent: 19 August 2009 10:41
To: Richard Green; Steve Bayliss; Fedora Commons Developers
Subject: AW: [Fedora-commons-developers] New admin client
Hello,
one part of the problem seems to be the encoding of special characters in
XML. There is a difference between content and the XML serialization of
content. If I put the string "a<b" in an editors field I would expect
"a<b" inside the XML document. Showing the content from that XML document
to a user the entity reference must be resolved to the character it stands
for resulting in "a<b". That should be done for all special characters and
so, there is no problem with entity references as content. The string "a
© b" is serialized in XML as "a &copy; b". Using that content in a
HTML document the entity reference must be resolved to the appropriate
character.
Maybe I did not get the problem .
Cheers, Frank
Von: Richard Green [mailto:[email protected]]
Gesendet: Mittwoch, 19. August 2009 11:25
An: Steve Bayliss; Fedora Commons Developers
Betreff: Re: [Fedora-commons-developers] New admin client
Potentially there could be a problem with re-editing?
Personally I can live with a copyright symbol being © but see below.
There are other things that give problems too. The real issue, I suggest,
is with (eg) dc:description. Like it or not there will be people using the
editor to edit their descriptive metadata and so dc:description is going to
get things like abstracts thrown at it. From bitter experience I know that
chemical (etc) abstracts regularly contain '<' (concentrations < 5ppm) and
all abstracts are capable of producing a '&' when you're not looking. Both
these, predictably, cause the editor to panic.
So, you put in the numeric codes and save, and when you re-open, what have
you got? < and & So why can't I put these in in the first place (I
can), and also © (I can't). It's inconsistent. Putting in < or & with
Alt+xxx isn't going to help - you'll just get an illegal character.
If we go for numeric codes there are too many to remember, so a drop-down?
R
___________________________________________________________________
Richard Green
Consultant to the University of Hull IT Systems Group
managing the CLIF and Hydra (Hull) Projects
http://edocs.hull.ac.uk
http://www.hull.ac.uk/clif
https://fedora-commons.org/confluence/display/hydra
From: Steve Bayliss [mailto:[email protected]]
Sent: 18 August 2009 17:09
To: 'Fedora Commons Developers'
Cc: Richard Green
Subject: Re: [Fedora-commons-developers] New admin client
We had a discussion on this on the Committer Meeting call today.
Taking a look at http://dublincore.org/documents/dcmes-xml/, 2.5. Language
and character encoding - this says that HTML entities should not be used;
but for instance © for the copyright symbol is ok. And the way that the
DC datastream is wrapped in FOXML would cause problems in declaring these
HTML entities. So in the FOXML the HTML entities (if allowed in the admin
client) would need converting to the character code representations.
It would seem that this is really a usability issue for the new admin client
- ie how to make it easy for users to enter symbols such as the copyright
symbol?
Should the admin client handle this at all, or leave it to the platform to
deal with (eg, in Windows you could enter C by typing Alt+01699, or by using
Character Map)?
What do people think? Provide buttons/dropdowns etc for entering special
symbols; allow typing HTML entities but convert straight to the character
code equivalent? Other suggestions?
Steve
-----Original Message-----
From: Bill Branan [mailto:[email protected]]
Sent: 05 August 2009 15:12
To: Peter Cliff
Cc: Richard Green; Fedora Commons Developers
Subject: Re: [Fedora-commons-developers] New admin client
Hi Pete,
I believe that you're correct in that the entity definitions for these
characters are just not included, so when the XML is processed during the
add/modify datastream calls the parsing fails. I've added an issue in JIRA
for this: http://fedora-commons.org/jira/browse/FCREPO-520.
Bill
On Wed, Aug 5, 2009 at 6:04 AM, Peter Cliff <[email protected]>
wrote:
Possibly not relevant at all - having not tried to enter & anything into
the new admin client! ;-) - but (I expect you know) you need to define
entities with names (©) etc.
See:
http://www.xml.com/pub/a/98/08/xmlqna2.html
http://www.tizag.com/xmlTutorial/xmlentity.php
So my guess is that somewhere some XML parsing/creating is happening
behind the scenes of that client and it is throwing the whole thing off
when the XML processing fails on account of an undefined entity?
I couldn't find any entity definitions for the HTML named ones in the
src/xsd/ (aside from the reference in xhtml1-strict.xsd). Do there need
to be some?
Hope that is useful and not teaching either of you to suck eggs! ;-)
Pete Cliff
OULS
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers