[ 
https://issues.apache.org/jira/browse/IO-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285944#comment-17285944
 ] 

Gary D. Gregory commented on IO-718:
------------------------------------

Note that the char buffer was/is documented as thread unsafe and mentions that 
buffers (plural) are shared this is not explicitly stated for the byte buufer.

Regardless, this is easy to address. Please see git master or the latest 
snapshot build.

In general, do not assume thread-safety unless it is stated.


> FileUtils.checksumCRC32 and FileUtils.checksum are not thread safe
> ------------------------------------------------------------------
>
>                 Key: IO-718
>                 URL: https://issues.apache.org/jira/browse/IO-718
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.8.0
>         Environment: Apache Commons Io 2.8.0.
> JDK 1.8.0_181.
>            Reporter: Robert Cooper
>            Priority: Major
>
> When calling {{FileUtils.checksumCRC32}} from multiple threads (in order to 
> improve throughput when calculating CRC's for a large folder), the code is 
> not thread-safe, resulting in incorrect CRC output.
> The following simple test demonstrates the issue:
> {code:java}
> @Test
> public void should() throws ExecutionException, InterruptedException {
>   File testFile = new File("C:\\Temp\\large-file.txt");
>   // ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
>   ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(5);
>   List<Future<Long>> futures = new ArrayList<>();
>   for (int i = 0; i < 20; i++) {
>     futures.add(scheduler.submit(() -> FileUtils.checksumCRC32(testFile)));
>   }
>   List<Long> crcs = new ArrayList<>();
>   for (Future<Long> future : futures) {
>     crcs.add(future.get());
>   }
>   Assertions.assertThat(crcs).allMatch(c -> crcs.get(0).equals(c));
> } {code}
> In the above code, with a thread-pool size of 1, all calculated CRC's for the 
> file are the same.  With a thread-pool size of more, the CRC's differ.
> The issue appears to be related to the use of a common {{SKIP_BYTE_BUFFER}} 
> in {{IOUtils.consume}}.  The multiple threads all read into the same buffer 
> as the data is being "discarded".  However, {{FileUtils.checksum}} uses a 
> {{CheckedInputStream}} to calculate the CRC, which uses the value read into 
> the shared buffer.  With multiple threads writing to that buffer the CRC 
> mechanism breaks down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to