fabriziofortino commented on PR #886:
URL: https://github.com/apache/jackrabbit-oak/pull/886#issuecomment-1494578558

   > I have some concerns that the changes will lower performance, but can't be 
sure of it.
   > 
   > Another question on my mind is that we do not really understand what was 
causing the files to get corrupted. TCP is a reliable protocol, so it should 
handle and correct transmission errors by itself. I would also hope that the 
file system is reliable, so that whatever we write from Java will be correctly 
written to disk. So where were the errors happening? Could it be bad disks? Or 
maybe we were not using `transferFrom` correctly, maybe it needs to be called 
in a loop. The documentation is ambiguous:
   > 
   > > An invocation of this method may or may not transfer all of the 
requested bytes; whether or not it does so depends upon the natures and states 
of the channels. Fewer than the requested number of bytes will be transferred 
if the source channel has fewer than count bytes remaining, or if the source 
channel is non-blocking and has fewer than count bytes immediately available in 
its input buffer.
   > 
   > 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,long,long)
   > 
   > A socket is blocking, so it should be able to transfer everything in a 
single call, but it is not clear.
   > 
   > The checksum validation in this PR should catch the problems and correct 
them, so this is more of a theoretical question just to get a deeper 
understanding of this issue.
   
   @nfsantos 
   While it is theoretically possible for a file to become corrupted while it 
is being written to disk, it is relatively unlikely in most cases.
   
   AFAIK the OS should provide mechanisms to ensure that file writes are atomic 
and consistent, meaning that either the entire write operation completes 
successfully or it fails and leaves the file in an unchanged state.
   
   I agree TCP is a reliable protocol but it does not provide any guarantees of 
data integrity beyond detecting errors during transmission. In other words, it 
cannot detect if the received data has been altered or corrupted after it was 
transmitted. Checksumming should detect these cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to