Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org
On Wednesday 04 May 2005 16:03, Martin Michlmayr wrote: Package: antiword Version: 0.35-1 Severity: normal I created a simply document with OpenOffice.Org 1.1.2 and saved it as a Word document. When I tried to look at it with antiword, I only got the error: | I'm afraid the text stream of this file is too small to handle. This is both with antiword 0.35 and 0.36.1 and with documents stored with OpenOffice.Org as Word 6.0, Word 95 and Word 97/2000/XP. I tried the file in MS Word 2003 SP1 on Windows XP and it loaded without any problems, so this seems to be a problem in antiword. An example file is attached below. It's a fairly simple file - basically a bullet list with a number of items. (Thanks for antiword, by the way. I'm a text-based user and really appreciate not having to load OOo just to view DOC documents people send to me.) This is not a bug, it is a missing feature. Let me explain. Inside a Word file the text is stored in a so called text stream. There are two possible text streams: a small block text stream and a large block text stream. The small blocks are 64 bytes in size, the large blocks are 512 bytes in size. Because the difference in size Antiword would need two different methods for reading those two text streams. The method for reading that small block text stream has not been implemented yet. The result is that Word files with no large block text stream can no be read by Antiword. Such Word file are mostly smaller than about 12 kilobytes and have less than 1024 bytes of text. The reason for not implementing the missing fearture is simple. Word documents that use the small block text stream can not be produced by Word for Windows (all versions), but only Word for Mac. And now by OpenOffice. Note that these documents can be read by all versions of Word. Kind Regards, Adri van Os -- The Antiword Team [EMAIL PROTECTED] http://www.winfield.demon.nl/index.html for version 0.36 (16 Oct 2004) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org
retitle 307657 please support small block text streams severity 307657 wishlist thanks * Antiword team [EMAIL PROTECTED] [2005-05-05 20:19]: This is not a bug, it is a missing feature. OK. Let me explain. ... The reason for not implementing the missing fearture is simple. Word documents that use the small block text stream can not be produced by Word for Windows (all versions), but only Word for Mac. And now by OpenOffice. Note that these documents can be read by all versions of Word. Thanks for the explanation. I hope that you will get around to implementing this feature at some point, and I'm therefore leaving this bug report open as a feature request. -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#307657: Cannot read (certain?) DOC files created by OpenOffice.Org
Package: antiword Version: 0.35-1 Severity: normal I created a simply document with OpenOffice.Org 1.1.2 and saved it as a Word document. When I tried to look at it with antiword, I only got the error: | I'm afraid the text stream of this file is too small to handle. This is both with antiword 0.35 and 0.36.1 and with documents stored with OpenOffice.Org as Word 6.0, Word 95 and Word 97/2000/XP. I tried the file in MS Word 2003 SP1 on Windows XP and it loaded without any problems, so this seems to be a problem in antiword. An example file is attached below. It's a fairly simple file - basically a bullet list with a number of items. (Thanks for antiword, by the way. I'm a text-based user and really appreciate not having to load OOo just to view DOC documents people send to me.) -- System Information: Debian Release: 3.1 APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.10-1-686 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages antiword depends on: ii libc6 2.3.2.ds1-21 GNU C Library: Shared libraries an -- no debconf information -- Martin Michlmayr http://www.cyrius.com/ xx.doc Description: MS-Word document