Dear POI dev team,

we've been experiencing an indeterminism problem with POI's xlsx format, when 
generating  hash values with the following method in testng test cases:

FileTest:
@Test(enabled = true) // indeterminism at random iterations, such as 400 or 1290
    public void emptyXLSXTest() throws IOException, NoSuchAlgorithmException {
        final Hasher hasher = new HasherImpl();
        boolean differentSHA256Hash = false;
        for (int i = 0; i < 10000; i++) {
            final ByteArrayOutputStream excelAdHoc1 = 
BusinessPlanInMemory.getEmptyExcel("xlsx");
            final ByteArrayOutputStream excelAdHoc2 = 
BusinessPlanInMemory.getEmptyExcel("xlsx");
            
            byte[] expectedByteArray = excelAdHoc1.toByteArray();
        String expectedSha256 = hasher.sha256(expectedByteArray);
        byte[] actualByteArray = excelAdHoc2.toByteArray();
        String actualSha256 = hasher.sha256(actualByteArray);
                        
        if (!expectedSha256.equals(actualSha256)) {
                differentSHA256Hash = true;
                System.out.println("ITERATION: " + i);
                System.out.println("EXPECTED HASH: " + expectedSha256);
                System.out.println("ACTUAL HASH: " + actualSha256);
                break;
            }
        }
        Assert.assertTrue(differentSHA256Hash, "Indeterminism did not occur");
    }


Referenced Hasher and POI code:

HasherImpl:
public String sha256(final InputStream stream) throws IOException, 
NoSuchAlgorithmException {
        final MessageDigest digest = MessageDigest.getInstance("SHA-256");
        final byte[] bytesBuffer = new byte[300000]; 
        int bytesRead = -1;
        while ((bytesRead = stream.read(bytesBuffer)) != -1) {
            digest.update(bytesBuffer, 0, bytesRead);
        }
        final byte[] hashedBytes = digest.digest();
        return bytesToHex(hashedBytes);
    }


We tried to eliminate indeterminism due to meta data like creation time, to no 
avail:

public static ByteArrayOutputStream getEmptyExcel(final String fileextension) 
throws IOException {
        Workbook wb;

        if (fileextension.equals("xls")) {
            wb = new HSSFWorkbook();
        }
        else {
            wb = new XSSFWorkbook();
            final POIXMLProperties props = ((XSSFWorkbook) wb).getProperties();
            final POIXMLProperties.CoreProperties coreProp = 
props.getCoreProperties();
            coreProp.setCreated("");
            coreProp.setIdentifier("1");
            coreProp.setModified("");
        }

        wb.createSheet();

        final ByteArrayOutputStream excelStream = new ByteArrayOutputStream();
        wb.write(excelStream);
        wb.close();
        return excelStream;
    }


Indeterminism occurs at random iterations, such as 400 or 1290, and we've not 
found out, why this would happen, yet. 
Do you have any clue, what might be causing the problem, maybe some meta data 
flag we've not been addressing, internal compression or anything else?
HSSF / ".xls" instead seems not  to have the same issues, btw.


Best regards,
Armin



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to