Hi, Replace the "stupid": writer = new BufferedWriter(new FileWriter(fileOutput));
by: writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileOutput), "UTF-8")); Unfortunately, you cannot give a charset to FileWriter itself. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > Sent: Monday, March 28, 2011 9:17 AM > To: java-user@lucene.apache.org > Subject: Re: file formats: MacRoman and UTF-8... > > hi, I'm using my own code: > > > > Writer writer = null; > > try { > //File fileOutput = new File("output.trectext"); File fileOutput = new > File(args[1]); writer = new BufferedWriter(new FileWriter(fileOutput)); > writer.write(contents.toString()); > } catch (FileNotFoundException e) { > e.printStackTrace(); > } catch (IOException e) { > e.printStackTrace(); > } finally { > try { > if (writer != null) { > writer.close(); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > > > > > > On 28 March 2011 09:13, Paul Libbrecht <p...@hoplahup.net> wrote: > > > java -Dfile.encoding=utf-8 > > should do the trick. > > > > Or... which java app are you using? > > > > paul > > > > > > Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit : > > > > > When I run my Lucene app and a parse a xml file I get the following > > > error due to some fonts such as "é" written in the text file. > > > > > > If I save the text file as UTF-8 with my text editor I don't have > > > this issue, but when I create it with a java app, it is saved as MacRoman. > > > > > > How can I specify a different format with Java instead ? > > > > > > thanks > > > > > > [CODE]Exception in thread "main" > > > > > > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti > on: > > > Invalid byte 1 of 1-byte UTF-8 sequence. > > > at > > > > > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8 > > Reader.java:684) > > > at > > > > > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader. > > java:554) > > > at > > > > > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntit > > yScanner.java:1742) > > > at > > > > > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLE > > ntityScanner.java:1416) > > > at > > > > > > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm > pl > > > $FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:279 > 2) > > > at > > > > > > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X > M > > LDocumentScannerImpl.java:648) > > > at > > > > > > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm > pl > > .scanDocument(XMLDocumentFragmentScannerImpl.java:511) > > > at > > > > > > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM > > L11Configuration.java:808) > > > at > > > > > > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM > > L11Configuration.java:737) > > > at > > > > > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.j > > ava:119) > > > at > > > > > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abs > > tractSAXParser.java:1205) > > > at > > > > > > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.pa > > rse(SAXParserImpl.java:522) > > > at org.apache.commons.digester.Digester.parse(Digester.java:1871) > > > at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE] > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org