https://issues.apache.org/bugzilla/show_bug.cgi?id=52446
Antoni Mylka <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #2 from Antoni Mylka <[email protected]> 2012-01-11 10:40:17 UTC --- I took a very close look in the debugger. POIFSViewer seems to work at a higher-level, where blocks are already combined into streams. I know nothing about the POI format, yet from what I understand it goes like this: NPropertyTable is constructed with an iterator on byte buffers. Each byte buffer represents a single block. In this file the blocks are 512-bytes large. The NPropertyTable constructor goes through this stack trace twice: ByteArrayBackedDataSource.read(int, long) line: 48 NPOIFSFileSystem.getBlockAt(int) line: 420 NPOIFSStream$StreamBlockByteBufferIterator.next() line: 213 NPOIFSStream$StreamBlockByteBufferIterator.next() line: 1 NPropertyTable.buildProperties(Iterator<ByteBuffer>, POIFSBigBlockSize) line: 84 The first time getBlockAt is called with 946. When I look at offset 947*512=484864 within the file it contains four: UTF-16 strings like "Root Entry", "Data", "1Table", "WordDocument". AFAIU these are names of top-level directory entries. This block is parsed correctly by PropertyFactory.convertToProperties(data, properties); Afterwards comes the second block, index 956. It also comes down to ByteArrayBackedDataSource.read(int, long) line: 48. Unfortunately the (957*512 + 512) exceeds the size of the file. The returned byte buffer is only 510 bytes large, hence the BufferUnderflowException. I don't know how many blocks should there be (there is BAT, but I don't understand it). What I know, is that this file has been truncated somewhere in the process. When the second block is parsed, with 510 bytes, the PropertyFactory.convertToProperties begins with int property_count = data.length / POIFSConstants.PROPERTY_SIZE; In my case this evaluates to 3. The last 126 bytes are not taken into account - hence no errors. The second block, when viewed in XVI shows UTF-16 strings "SummaryInformation", "DocumentSummaryInformation", and "\u0001CompObj" (the three "correct" properties). The fourth, truncated property contains only zeros: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Therefore no information is lost. I think that my workaround is actually correct. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
