Re: [Dspace-tech] utf8 and dspace

2008-03-10 Thread Mark H. Wood
On Sat, Mar 08, 2008 at 11:11:02AM +0800, Jayan Chirayath Kurian wrote:

 In a DSpace batch import, the importer stops at special characters
 (e.g. ). This can be resolved by converting into its equivalent
 entity represented as amp;. Is there any other solution rather
 than changing this manually.

Oh, that.  That's not a charset encoding (UTF-8) issue; it's an XML
encoding issue.  Well-formed XML can't have naked ampersands or left
angle brackets; they must be specified as coded character entities.
You'd have the same problem no matter what charset encoding you used.

There *are* charset encoding issues, often when building a batch by
cut'n'pasting from Windows editors or office tools.  I was advised to
add an XML PI to the head of the dublin_core.xml to specify the likely
encoding:

  ?xml version='1.0' encoding='windows-1252' ?

and that took care of all the sections, em-dashes, accents, and silly
smartquotes.

-- 
Mark H. Wood, Lead System Programmer   [EMAIL PROTECTED]
Typically when a software vendor says that a product is intuitive he
means the exact opposite.



pgpxwhtqXuwhc.pgp
Description: PGP signature
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] utf8 and dspace

2008-03-07 Thread Christian Voelker
Hello,

Am 07.03.2008 um 16:49 schrieb LARC/J.L.Shipman/jshipman:

 We are running dspace 1.4.2, postgresql 8, solaris 10.
 Is UTF8 a necessity for dspace.  My understanding is
 that Sun's en_US.ISO8859-1 includes most of UTF8 except
 for the far east languages.  Any help is appreciated.

@nasa.gov: Working at an international site? I dont
know whether UTF-8 is strictly required everywhere
but ISO 8859-1 is not even sufficient for any european
language as it is missing the Euro sign e.g. (which
requires ISO 8859-15 at least).

What I do know is that everything is becoming better
everyday since we aim at being UTF-8 strictly and
only. We have lept into live service from a test
installation and had old ISO 8859 stuff in our site
for over a year. When I switched to a new server
in January I made sure that everything is UTF-8 from
now on. I mean everything including file names on
the system level. This is default in a current
Debian BTW. I still have some crap inside, in the
search index I found a glitch last week and in the
email templates too, but everywhere where I got rid
of the old stuff I am really happy about flawless
functionality and display.

Come on, ISO 8859 is from the last century. The time
is gone. UTF-8 is the way to go, no way to argue. You
are running a system made for long term preservation.

Bye, Christian


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] utf8 and dspace

2008-03-07 Thread Mark H. Wood
Let's turn the question around.  Is UTF-8 a problem for you? why?
What would you need to make it no longer a problem?

-- 
Mark H. Wood, Lead System Programmer   [EMAIL PROTECTED]
Typically when a software vendor says that a product is intuitive he
means the exact opposite.



pgpwGHIvxmG5p.pgp
Description: PGP signature
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech