Hi, You have to give the Charset when creating the Writer. If you give no charset, Java uses the platform default. This question has nothing to do with Lucene, it is better suited at an XML or JAVA general forum.
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > Sent: Monday, March 28, 2011 9:04 AM > To: java-user@lucene.apache.org > Subject: file formats: MacRoman and UTF-8... > > When I run my Lucene app and a parse a xml file I get the following error due > to some fonts such as "é" written in the text file. > > If I save the text file as UTF-8 with my text editor I don't have this issue, but > when I create it with a java app, it is saved as MacRoman. > > How can I specify a different format with Java instead ? > > thanks > > [CODE]Exception in thread "main" > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti > on: > Invalid byte 1 of 1-byte UTF-8 sequence. > at > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Re > ader.java:684) > at > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.ja > va:554) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityS > canner.java:1742) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEn > tityScanner.java:1416) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm > pl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2 > 792) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X > MLDocumentScannerImpl.java:648) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm > pl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML > 11Configuration.java:808) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML > 11Configuration.java:737) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.jav > a:119) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstra > ctSAXParser.java:1205) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.par > se(SAXParserImpl.java:522) > at org.apache.commons.digester.Digester.parse(Digester.java:1871) > at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE] --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org