Dear Steve,
I have two installations (one in Debian Linux and one in Windows 7). The
problem was caused in Windows edition (although I didn't try it in Linux
yet). You can send me the tool whenever you want.
Thanks again,
Dimitris.
On Fri, Nov 26, 2010 at 2:18 PM, Steve Bayliss <
stephen.bayl...@acuityunlimited.net> wrote:
> Hi Dimitris
>
> Thanks very much for confirming this.
>
> From my inspection of the source, there is some "unsafe" code where the the
> platform default encoding will be used rather than UTF-8, which in cases
> could cause this problem (particularly this will happen on Windows).
>
> Out of interest, what OS are you running on?
>
> Are you able to identify the default encoding (in Java this is
> java.nio.charset.Charset.defaultCharset() - I can send you a small utility
> to find this out if you are willing to provide this feedback). This would
> be useful information so that (a) the bug can be reproduced and (b) the fix
> can be correctly tested.
>
> Now that you have verified that setting the encoding resolve the issue this
> indicates that there is a bug, and I have raised
> https://jira.duraspace.org/browse/FCREPO-832 for this and attached your
> sample XML.
>
> The work-around is to set the JVM file.encoding - I have added a note to
> the oaiprovider page about doing this.
>
> Regards
> Steve
>
>
> -----Original Message-----
> *From:* Dimitris Gavrilis [mailto:gavri...@gmail.com]
> *Sent:* 26 November 2010 11:29
> *To:* Support and info exchange list for Fedora users.
> *Subject:* Re: [fcrepo-user] Proia multilingual-
> java.io.UTFDataFormatException
>
> Dear Steve,
>
> It finally worked. I think it was the -Dfile.encoding=utf-8 in the
> JAVA_OPTS.
>
> Thank you very much for your assistance,
> Dimitris.
>
>
> On Fri, Nov 26, 2010 at 12:02 PM, Steve Bayliss <
> stephen.bayl...@acuityunlimited.net> wrote:
>
>> Hi Dimitris
>>
>> Just to confirm,
>>
>> - were you specifying the encoding to the Tomcat JVM (ie using
>> -Dfile.encoding=utf-8)?
>> - which SQL database (and version) are you using?
>>
>> Thanks
>> Steve
>>
>>
>> -----Original Message-----
>> *From:* Dimitris Gavrilis [mailto:gavri...@gmail.com]
>> *Sent:* 26 November 2010 09:41
>> *To:* Support and info exchange list for Fedora users.
>> *Subject:* Re: [fcrepo-user] Proia multilingual-
>> java.io.UTFDataFormatException
>>
>> Hi Steve,
>>
>> I did delete the tmp/proai folder and truncated the proai database but I
>> still get the same error (see the log below).
>>
>>
>>
>> proai.error.ServerException: Error parsing record xml
>> at proai.cache.ParsedRecord.<init>(ParsedRecord.java:70)
>> at proai.cache.Worker.attempt(Worker.java:111)
>> at proai.cache.Worker.run(Worker.java:51)
>> Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8
>> sequen
>> ce.
>> at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown
>> Source)
>> at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>> at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>> at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
>> Source)
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
>> nown Source)
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
>> Dispatcher.dispatch(Unknown Source)
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
>> known Source)
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>> at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
>> at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
>> at proai.cache.ParsedRecord.<init>(ParsedRecord.java:62)
>> ... 2 more
>>
>>
>>
>> On Thu, Nov 25, 2010 at 1:11 PM, West, Graeme <graeme.w...@gcu.ac.uk>wrote:
>>
>>> Hi Dimitris,
>>> Did this error occur after making the encoding change?
>>>
>>> It may be a good idea to stop your servlet container, drop/truncate the
>>> tables from the ProAI database, delete the ProAI temporary files directory
>>> (by default /tmp/proai ), and then restart your servlet container. This will
>>> rebuild the ProAI database completely and ensure that you're not seeing
>>> cached errors.
>>>
>>> Regards,
>>>
>>> Graeme
>>>
>>> On 25 Nov 2010, at 11:00, Dimitris Gavrilis wrote:
>>>
>>> Dear Steve,
>>>
>>> Thanks for you help. I did change the header (UTF-8) in the top of the
>>> file as you suggested but I still get the same error. The file seems ok when
>>> accessed through fedora (
>>> http://localhost:8080/fedora/objects/iid:1/datastreams/mods/content).
>>>
>>> I'm attaching below the error from the fedora's console:
>>>
>>>
>>> proai.error.ServerException: Error parsing record xml
>>> at proai.cache.ParsedRecord.<init>(ParsedRecord.java:70)
>>> at proai.cache.Worker.attempt(Worker.java:111)
>>> at proai.cache.Worker.run(Worker.java:51)
>>> Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8
>>> sequen
>>> ce.
>>> at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown
>>> Source)
>>> at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>>> at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>>> at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
>>> Source)
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
>>> nown Source)
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
>>> Dispatcher.dispatch(Unknown Source)
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
>>> known Source)
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>>> Source)
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>>> Source)
>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>>> Source)
>>> at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
>>> at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
>>> at proai.cache.ParsedRecord.<init>(ParsedRecord.java:62)
>>> ... 2 more
>>>
>>>
>>>
>>> On Thu, Nov 25, 2010 at 12:21 PM, Steve Bayliss <
>>> stephen.bayl...@acuityunlimited.net<mailto:
>>> stephen.bayl...@acuityunlimited.net>> wrote:
>>> Hi Dimitris
>>>
>>> It would certainly be worthwhile trying Graeme's suggestion, although I
>>> suspect that if Fedora didn't determine the correct encoding on ingest
>>> then
>>> this would cause problems elsewhere. (In any case you should correct
>>> this
>>> incorrect encoding declaration to UTF-8).
>>>
>>> I've taken a look at the proai oaiprovider source, and there is some
>>> "unsafe" code in there where the default platform encoding will be used.
>>> (eg
>>> FedoraOAIDriver.java line 275)
>>>
>>> 1) could you provide a full log of the exception (ie the full stack
>>> trace)
>>> 2) could you try setting the JVM default encoding by using
>>> -Dfile.encoding=utf-8 (eg add this to CATALINA_OPTS)
>>>
>>> Thanks
>>> Steve
>>>
>>> -----Original Message-----
>>> From: West, Graeme [mailto:graeme.w...@gcu.ac.uk<mailto:
>>> graeme.w...@gcu.ac.uk>]
>>> Sent: 25 November 2010 09:44
>>> To: Support and info exchange list for Fedora users.
>>> Subject: Re: [fcrepo-user] Proia multilingual-
>>> java.io.UTFDataFormatException
>>>
>>>
>>> Hi Dimitris,
>>> I notice that on the first line, your XML declaration states:
>>>
>>> <?xml version="1.0" encoding="UTF8"?>
>>>
>>> This should be:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>>
>>> ProAI is probably rejecting the documents because of this 'unknown'
>>> encoding.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>>
>>> Graeme West
>>> Digital Repository Developer
>>> Information Services
>>> Glasgow Caledonian University
>>> graeme.w...@gcu.ac.uk<mailto:graeme.w...@gcu.ac.uk><mailto:
>>> graeme.w...@gcu.ac.uk<mailto:graeme.w...@gcu.ac.uk>>
>>>
>>>
>>>
>>> On 24 Nov 2010, at 08:31, Dimitris Gavrilis wrote:
>>>
>>> Hi Steve,
>>>
>>> I'm attaching an xml sample of a record that produces this error.
>>>
>>> Thanks,
>>> Dimitris.
>>>
>>> On Wed, Nov 24, 2010 at 9:55 AM, Steve Bayliss
>>> <stephen.bayl...@acuityunlimited.net<mailto:
>>> stephen.bayl...@acuityunlimited.net><mailto:
>>> stephen.bayl...@acuityunlimited<mailto:stephen.bayl...@acuityunlimited>.
>>> net>> wrote:
>>> Hi Dimitris
>>>
>>> Do you have an example object FOXML file that could be used to reproduce
>>> this?
>>>
>>> Thanks
>>> Steve
>>>
>>>
>>> -----Original Message-----
>>> From: Dimitris Gavrilis
>>> [mailto:gavri...@gmail.com<mailto:gavri...@gmail.com><mailto:
>>> gavri...@gmail.com<mailto:gavri...@gmail.com>>]
>>> Sent: 23 November 2010 15:17
>>> To:
>>> fedora-commons-users@lists.sourceforge.net<mailto:
>>> fedora-commons-users@lists.sourceforge.net><mailto:
>>> fedora-commons-us...@lists<mailto:fedora-commons-us...@lists>
>>> .sourceforge.net<http://sourceforge.net/>>
>>> Subject: [fcrepo-user] Proia multilingual -
>>> java.io.UTFDataFormatException
>>>
>>> Hi,
>>>
>>> I've setup fedora with Proai and whenever proai tries to parse non
>>> english
>>> records (Greek) I get a java.io.UTFDataFormatException. Although I've
>>> seen
>>> that this problem exists, I haven't managed to find a solution. When i
>>> exclude non-English text, proai works fine.
>>>
>>> Thanks in advance,
>>> Dimtris.
>>>
>>>
>>> ----------------------------------------------------------------------------
>>> --
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net<mailto:
>>> Fedora-commons-users@lists.sourceforge.net><mailto:
>>> fedora-commons-us...@lists<mailto:fedora-commons-us...@lists>
>>> .sourceforge.net<http://sourceforge.net/>>
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>>
>>>
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management
>>> service<http://www.altman.co.uk/emailsystems>
>>>
>>>
>>> <iid_1_mods.xml>------------------------------------------------------------
>>> ------------------
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management
>>> service - www.altman.co.uk/emailsystems<
>>> http://www.altman.co.uk/emailsystems>
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net<mailto:
>>> Fedora-commons-users@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management
>>> service - www.altman.co.uk/emailsystems<
>>> http://www.altman.co.uk/emailsystems>
>>>
>>>
>>> Glasgow Caledonian University is a registered Scottish charity, number
>>> SC021474
>>>
>>> Winner: Times Higher Education's Widening Participation Initiative of the
>>> Year 2009 and Herald Society's Education Initiative of the Year 2009
>>>
>>> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en
>>> .html<
>>> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en%0A.html
>>> >
>>>
>>>
>>> ----------------------------------------------------------------------------
>>> --
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net<mailto:
>>> Fedora-commons-users@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net<mailto:
>>> Fedora-commons-users@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>>
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management service<http://www.altman.co.uk/emailsystems>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management service - www.altman.co.uk/emailsystems
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>> Email has been scanned for viruses by Altman Technologies' email
>>> management service - www.altman.co.uk/emailsystems
>>>
>>>
>>> Glasgow Caledonian University is a registered Scottish charity, number
>>> SC021474
>>>
>>> Winner: Times Higher Education's Widening Participation Initiative of the
>>> Year 2009 and Herald Society's Education Initiative of the Year 2009
>>>
>>> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>>> Tap into the largest installed PC base & get more eyes on your game by
>>> optimizing for Intel(R) Graphics Technology. Get started today with the
>>> Intel(R) Software Partner Program. Five $500 cash prizes are up for
>>> grabs.
>>> http://p.sf.net/sfu/intelisp-dev2dev
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
>> Tap into the largest installed PC base & get more eyes on your game by
>> optimizing for Intel(R) Graphics Technology. Get started today with the
>> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
>> http://p.sf.net/sfu/intelisp-dev2dev
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
> Tap into the largest installed PC base & get more eyes on your game by
> optimizing for Intel(R) Graphics Technology. Get started today with the
> Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
> http://p.sf.net/sfu/intelisp-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>
>
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users