https://bz.apache.org/bugzilla/show_bug.cgi?id=63813

--- Comment #3 from Axel Howind <a...@dua3.com> ---
When reading the word file, text pieces are read by converting `byte[]` to
String in `buildInitSB()`. I investigated the raw data passed to that method:

- so according to the unicode table, the "greater or equal sign" has the code
0x2265 which I also see in the debugger right before the "good one" bytes.

- right before "bad one" there's a 0x0028, which in Unicode is the left
parenthesis. 

So it seems that the error happens at a very low level when reading the byte
stream.

-----

Additional findings: LibreOffice doesn't render the symbol in front of "bad
one" at all. Pages displays the correct symbol.

-----

Extracting the file on the command line yields:

axel@xiaolong tmp % unzip ../symbol_test.doc 
Archive:  ../symbol_test.doc
warning [../symbol_test.doc]:  10574 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: theme/theme/themeManager.xml  
  inflating: theme/theme/theme1.xml  
  inflating: theme/theme/_rels/themeManager.xml.rels  

Could it be that the file is corrupt? Compare with a simple test document:

axel@xiaolong tmp % unzip ../Test.docx 
Archive:  ../Test.docx
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: word/_rels/document.xml.rels  
  inflating: word/document.xml       
  inflating: word/theme/theme1.xml   
  inflating: word/settings.xml       
  inflating: docProps/core.xml       
  inflating: word/fontTable.xml      
  inflating: word/webSettings.xml    
  inflating: word/styles.xml         
  inflating: docProps/app.xml

But since Apple pages renders it correctly and you said that you have multiple
such documents, maybe I am missing something.

Anyway, I'm out of this one.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to