[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Woonsan Ko updated JCR-4369: ---------------------------- Description: While using S3DataStore, the following logs are observed occasionally: {noformat} WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use. {noformat} The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client. After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...). Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\] Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following: - If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it. - Otherwise, it should {{abort}} the input stream. The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\] \[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187 \[2\] https://github.com/aws/aws-sdk-java/issues/1657 was: While using S3DataStore, the following logs are observed occasionally: {noformat} WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use. {noformat} The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client. After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...). Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\] Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following: - If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it. - Otherwise, it should {{abort}} the input stream. \[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187 > Avoid S3 Incomplete Read Warning > -------------------------------- > > Key: JCR-4369 > URL: https://issues.apache.org/jira/browse/JCR-4369 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-aws-ext > Affects Versions: 2.16.3, 2.17.5 > Reporter: Woonsan Ko > Priority: Minor > > While using S3DataStore, the following logs are observed occasionally: > {noformat} > WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] > Not all bytes were read from the S3ObjectInputStream, > aborting HTTP connection. This is likely an error and may result in > sub-optimal behavior. Request only the bytes you need via a ranged > GET or drain the input stream after use. > {noformat} > The warning logs are being left not only by both HTTP processing threads, but > also by background threads, which made me think of the possibility of some > 'issues' in {{S3Backend}} implementation. Not just caused by a broken http > connection by client. > After looking at the code, I noticed that > {{CachingDataStore#proactiveCaching}} is enabled by default, which means the > {{S3DataStore}} tries to _proactively_ download the binary content even when > accessing metadata through {{#getLastModified(...) and #getLength(...). > Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content > (in other words get an input stream on an {{S3Object}}, it is recommended to > _read_ all data or _abort_ the input stream. Just closing the input stream is > not good enough in AWS SDK perspective, which gives a warning if the input > stream was not fully read or not aborted explicitly. See > {{S3AbortableInputStream#close()}} method. \[1\] > Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, > InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) > could be improved like the following: > - If local cache file doesn't exist or it's on purge mode, it works as it > does: Just copy everything to local cache file and close it. > - Otherwise, it should {{abort}} the input stream. > The issue is a known one in AWS toolkit. \[2\] It seems like clients using > the toolkit needs to _abort_ the input stream if it doesn't want to read data > fully. \[2\] > \[1\] > https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187 > \[2\] https://github.com/aws/aws-sdk-java/issues/1657 -- This message was sent by Atlassian JIRA (v7.6.3#76005)