Robert Cooper created IO-718:
--------------------------------

             Summary: FileUtils.checksumCRC32 and FileUtils.checksum are not 
thread safe
                 Key: IO-718
                 URL: https://issues.apache.org/jira/browse/IO-718
             Project: Commons IO
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 2.8.0
            Reporter: Robert Cooper


When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to 
improve throughput when calculating CRC's for a large folder), the code is not 
thread-safe, resulting in incorrect CRC output.

The following simple test demonstrates the issue:
{code:java}
@Test
public void should() throws ExecutionException, InterruptedException {
  File testFile = new File("C:\\Temp\\large-file.txt");
  // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
  List<Future<Long>> futures = new ArrayList<>();
  for (int i = 0; i < 20; i++) {
    futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
  }
  List<Long> crcs = new ArrayList<>();
  for (Future<Long> future : futures) {
    crcs.add(future.get());
  }
  Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
} {code}
In the above code, with a thread-pool size of 1, all calculated CRC's for the 
file are the same.  With a thread-pool size of more, the CRC's differ.

The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} in 
{{IOUtils.consume}}.  The multiple threads all read into the same buffer as the 
data is being "discarded".  However, {{FileUtils.checksum}} uses a 
{{CheckedInputStream}} to calculate the CRC, which uses the value read into the 
shared buffer.  With multiple threads writing to that buffer the CRC mechanism 
breaks down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to