Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org

2005-05-05 Thread Antiword team
On Wednesday 04 May 2005 16:03, Martin Michlmayr wrote:
 Package: antiword
 Version: 0.35-1
 Severity: normal
 
 I created a simply document with OpenOffice.Org 1.1.2 and saved it as
 a Word document.  When I tried to look at it with antiword, I only got
 the error:
 
 | I'm afraid the text stream of this file is too small to handle.
 
 This is both with antiword 0.35 and 0.36.1 and with documents stored
 with OpenOffice.Org as Word 6.0, Word 95 and Word 97/2000/XP.  I tried
 the file in MS Word 2003 SP1 on Windows XP and it loaded without any
 problems, so this seems to be a problem in antiword.
 
 An example file is attached below.  It's a fairly simple file -
 basically a bullet list with a number of items.
 
 
 
 (Thanks for antiword, by the way.  I'm a text-based user and really
 appreciate not having to load OOo just to view DOC documents people
 send to me.)

This is not a bug, it is a missing feature.

Let me explain.
Inside a Word file the text is stored in a so called text stream. There are 
two possible text streams: a small block text stream and a large block text 
stream. The small blocks are 64 bytes in size, the large blocks are 512 
bytes in size. Because the difference in size Antiword would need two 
different methods for reading those two text streams. The method for 
reading that small block text stream has not been implemented yet. The 
result is that Word files with no large block text stream can no be read by 
Antiword. Such Word file are mostly smaller than about 12 kilobytes and 
have less than 1024 bytes of text.

The reason for not implementing the missing fearture is simple. Word 
documents that use the small block text stream can not be produced by Word 
for Windows (all versions), but only Word for Mac. And now by OpenOffice.
Note that these documents can be read by all versions of Word.

Kind Regards,
Adri van Os

-- 
The Antiword Team [EMAIL PROTECTED]
http://www.winfield.demon.nl/index.html for version 0.36 (16 Oct 2004)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org

2005-05-05 Thread Martin Michlmayr
retitle 307657 please support small block text streams
severity 307657 wishlist
thanks

* Antiword team [EMAIL PROTECTED] [2005-05-05 20:19]:
 This is not a bug, it is a missing feature.

OK.

 Let me explain.
...
 The reason for not implementing the missing fearture is simple. Word 
 documents that use the small block text stream can not be produced by Word 
 for Windows (all versions), but only Word for Mac. And now by OpenOffice.
 Note that these documents can be read by all versions of Word.

Thanks for the explanation.  I hope that you will get around to
implementing this feature at some point, and I'm therefore leaving
this bug report open as a feature request.
-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org

2005-05-04 Thread Martin Michlmayr
Package: antiword
Version: 0.35-1
Severity: normal

I created a simply document with OpenOffice.Org 1.1.2 and saved it as
a Word document.  When I tried to look at it with antiword, I only got
the error:

| I'm afraid the text stream of this file is too small to handle.

This is both with antiword 0.35 and 0.36.1 and with documents stored
with OpenOffice.Org as Word 6.0, Word 95 and Word 97/2000/XP.  I tried
the file in MS Word 2003 SP1 on Windows XP and it loaded without any
problems, so this seems to be a problem in antiword.

An example file is attached below.  It's a fairly simple file -
basically a bullet list with a number of items.



(Thanks for antiword, by the way.  I'm a text-based user and really
appreciate not having to load OOo just to view DOC documents people
send to me.)

-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.10-1-686
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages antiword depends on:
ii  libc6   2.3.2.ds1-21 GNU C Library: Shared libraries an

-- no debconf information

-- 
Martin Michlmayr
http://www.cyrius.com/


xx.doc
Description: MS-Word document