https://bz.apache.org/bugzilla/show_bug.cgi?id=63813

            Bug ID: 63813
           Summary: Special character (greater than equal) converts to '('
                    text in word documents
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: teresa....@linguamatics.com
  Target Milestone: ---

Version:

POI 4.1.0

I have documents (either 'doc' or 'docx') that have a special character for
'greater than equal' and using codes in 'WordToHtmlConverter', I see those
characters are converted into '('.

I tried with the latest apache poi release 4.1.0.


My java code is:


public class TestWordtoHtmlConverter {

    public static void main(String[] args ) {
        try {
        HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new
FileInputStream(args[0]));

        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                DocumentBuilderFactory.newInstance().newDocumentBuilder()
                        .newDocument());

        wordToHtmlConverter.processDocument(wordDocument);
        Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(out);

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        out.close();

        String result = new String(out.toByteArray());
        System.out.println(result);
      } catch (Exception e) {
      }

Is there anyway I can correctly identify these symbols?


In the sample document, I am interested in getting 'bad one'.


Thanks

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to