Hi Dimitris
 
Just to confirm, 
 
- were you specifying the encoding to the Tomcat JVM (ie using
-Dfile.encoding=utf-8)?
- which SQL database (and version) are you using?
 
Thanks
Steve
 
 
-----Original Message-----
From: Dimitris Gavrilis [mailto:gavri...@gmail.com] 
Sent: 26 November 2010 09:41
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] Proia multilingual-
java.io.UTFDataFormatException


Hi Steve,

I did delete the tmp/proai folder and truncated the proai database but I
still get the same error (see the log below).



proai.error.ServerException: Error parsing record xml
        at proai.cache.ParsedRecord.<init>(ParsedRecord.java:70)
        at proai.cache.Worker.attempt(Worker.java:111)
        at proai.cache.Worker.run(Worker.java:51)
Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8
sequen
ce.
        at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
nown Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
        at proai.cache.ParsedRecord.<init>(ParsedRecord.java:62)
        ... 2 more




On Thu, Nov 25, 2010 at 1:11 PM, West, Graeme <graeme.w...@gcu.ac.uk> wrote:


Hi Dimitris,
Did this error occur after making the encoding change?

It may be a good idea to stop your servlet container, drop/truncate the
tables from the ProAI database, delete the ProAI temporary files directory
(by default /tmp/proai ), and then restart your servlet container. This will
rebuild the ProAI database completely and ensure that you're not seeing
cached errors.

Regards,

Graeme


On 25 Nov 2010, at 11:00, Dimitris Gavrilis wrote:

Dear Steve,

Thanks for you help. I did change the header (UTF-8) in the top of the file
as you suggested but I still get the same error. The file seems ok when
accessed through fedora
(http://localhost:8080/fedora/objects/iid:1/datastreams/mods/content).

I'm attaching below the error from the fedora's console:


proai.error.ServerException: Error parsing record xml
       at proai.cache.ParsedRecord.<init>(ParsedRecord.java:70)
       at proai.cache.Worker.attempt(Worker.java:111)
       at proai.cache.Worker.run(Worker.java:51)
Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8
sequen
ce.
       at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
       at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
       at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
       at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
       at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
nown Source)
       at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
       at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
       at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
       at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
       at proai.cache.ParsedRecord.<init>(ParsedRecord.java:62)
       ... 2 more




On Thu, Nov 25, 2010 at 12:21 PM, Steve Bayliss
<stephen.bayl...@acuityunlimited.net<mailto:stephen.bayl...@acuityunlimited.
net>> wrote:
Hi Dimitris


It would certainly be worthwhile trying Graeme's suggestion, although I
suspect that if Fedora didn't determine the correct encoding on ingest then
this would cause problems elsewhere.  (In any case you should correct this
incorrect encoding declaration to UTF-8).

I've taken a look at the proai oaiprovider source, and there is some
"unsafe" code in there where the default platform encoding will be used. (eg
FedoraOAIDriver.java line 275)

1) could you provide a full log of the exception (ie the full stack trace)
2) could you try setting the JVM default encoding by using
-Dfile.encoding=utf-8 (eg add this to CATALINA_OPTS)

Thanks
Steve

-----Original Message-----

From: West, Graeme
[mailto:graeme.w...@gcu.ac.uk<mailto:graeme.w...@gcu.ac.uk>]
Sent: 25 November 2010 09:44
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] Proia multilingual-
java.io.UTFDataFormatException


Hi Dimitris,
I notice that on the first line, your XML declaration states:

<?xml version="1.0" encoding="UTF8"?>

This should be:
<?xml version="1.0" encoding="UTF-8"?>

ProAI is probably rejecting the documents because of this 'unknown'
encoding.

Hope this helps.

Regards,

Graeme West
Digital Repository Developer
Information Services
Glasgow Caledonian University

graeme.w...@gcu.ac.uk<mailto:graeme.w...@gcu.ac.uk><mailto:graeme.w...@gcu.a
c.uk<mailto:graeme.w...@gcu.ac.uk>>




On 24 Nov 2010, at 08:31, Dimitris Gavrilis wrote:

Hi Steve,

I'm attaching an xml sample of a record that produces this error.

Thanks,
Dimitris.

On Wed, Nov 24, 2010 at 9:55 AM, Steve Bayliss

<stephen.bayl...@acuityunlimited.net<mailto:stephen.bayl...@acuityunlimited.
net><mailto:stephen.bayl...@acuityunlimited<mailto:stephen.bayl...@acuityunl
imited>.

net>> wrote:
Hi Dimitris

Do you have an example object FOXML file that could be used to reproduce
this?

Thanks
Steve


-----Original Message-----
From: Dimitris Gavrilis

[mailto:gavri...@gmail.com<mailto:gavri...@gmail.com><mailto:gavri...@gmail.
com<mailto:gavri...@gmail.com>>]
Sent: 23 November 2010 15:17
To:

fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-us...@lists
.sourceforge.net><mailto:fedora-commons-us...@lists<mailto:fedora-commons-us
e...@lists>
.sourceforge.net<http://sourceforge.net/>>
Subject: [fcrepo-user] Proia multilingual - java.io.UTFDataFormatException

Hi,

I've setup fedora with Proai and whenever proai tries to parse non english
records (Greek) I get a java.io.UTFDataFormatException. Although I've seen
that this problem exists, I haven't managed to find a solution. When i
exclude non-English text, proai works fine.

Thanks in advance,
Dimtris.

----------------------------------------------------------------------------
--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-us...@lists
.sourceforge.net><mailto:fedora-commons-us...@lists<mailto:Fedora-commons-us
e...@lists>
.sourceforge.net<http://sourceforge.net/>>

https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



Email has been scanned for viruses by Altman Technologies' email management
service<http://www.altman.co.uk/emailsystems>

<iid_1_mods.xml>------------------------------------------------------------
------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
Email has been scanned for viruses by Altman Technologies' email management

service -
www.altman.co.uk/emailsystems<http://www.altman.co.uk/emailsystems>

_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-us...@lists
.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Email has been scanned for viruses by Altman Technologies' email management

service -
www.altman.co.uk/emailsystems<http://www.altman.co.uk/emailsystems>



Glasgow Caledonian University is a registered Scottish charity, number
SC021474

Winner: Times Higher Education's Widening Participation Initiative of the
Year 2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en

.html<http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6
219,en%0A.html>


----------------------------------------------------------------------------
--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list

Fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-us...@lists
.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



----------------------------------------------------------------------------
--

Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-us...@lists
.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


Email has been scanned for viruses by Altman Technologies' email management
service<http://www.altman.co.uk/emailsystems>


----------------------------------------------------------------------------
--

Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
Email has been scanned for viruses by Altman Technologies' email management
service - www.altman.co.uk/emailsystems
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Email has been scanned for viruses by Altman Technologies' email management
service - www.altman.co.uk/emailsystems


Glasgow Caledonian University is a registered Scottish charity, number
SC021474

Winner: Times Higher Education's Widening Participation Initiative of the
Year 2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en
.html

----------------------------------------------------------------------------
--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users



------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to