Hi,

You have to give the Charset when creating the Writer. If you give no
charset, Java uses the platform default. This question has nothing to do
with Lucene, it is better suited at an XML  or JAVA general forum.

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> Sent: Monday, March 28, 2011 9:04 AM
> To: java-user@lucene.apache.org
> Subject: file formats: MacRoman and UTF-8...
> 
> When I run my Lucene app and a parse a xml file I get the following error
due
> to some fonts such as "é" written in the text file.
> 
> If I save the text file as UTF-8 with my text editor I don't have this
issue, but
> when I create it with a java app, it is saved as MacRoman.
> 
> How can I specify a different format with Java instead ?
> 
> thanks
> 
> [CODE]Exception in thread "main"
> com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti
> on:
> Invalid byte 1 of 1-byte UTF-8 sequence.
> at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Re
> ader.java:684)
> at
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.ja
> va:554)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityS
> canner.java:1742)
> at
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEn
> tityScanner.java:1416)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2
> 792)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X
> MLDocumentScannerImpl.java:648)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML
> 11Configuration.java:808)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML
> 11Configuration.java:737)
> at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.jav
> a:119)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstra
> ctSAXParser.java:1205)
> at
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.par
> se(SAXParserImpl.java:522)
> at org.apache.commons.digester.Digester.parse(Digester.java:1871)
> at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE]


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to