[
https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178]
Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal
behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also
by background threads, which made me think of the possibility of some 'issues'
in {{S3DataStore}} implementation. Not just caused by a broken http connection
by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a
warning as _recommendation_ in that case, with closing the underlying
HttpRequest object properly. So, there's no issue in functionality for the
record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}}
is enabled by default, which means the {{S3DataStore}} tries to _proactively_
download the binary content, asynchronously in a new thread, even when
accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content
(in other words get an input stream on an {{S3Object}}, it is recommended to
_read_ all data or _abort_ the input stream. Just closing the input stream is
not good enough in AWS SDK perspective, which gives a warning if the input
stream was not fully read or not aborted explicitly. See
{{S3AbortableInputStream#close()}} method. \[1\]
Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String,
InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could
be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does:
Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the
toolkit needs to _abort_ the input stream if it doesn't want to read data
fully. \[2\]
\[1\]
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178]
Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal
behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also
by background threads, which made me think of the possibility of some 'issues'
in {{S3DataStore}} implementation. Not just caused by a broken http connection
by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a
warning as _recommendation_ in that case, with closing the underlying
HttpRequest object properly. So, there's no issue in functionality for the
record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}}
is enabled by default, which means the {{S3DataStore}} tries to _proactively_
download the binary content even when accessing metadata through
{{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content
(in other words get an input stream on an {{S3Object}}, it is recommended to
_read_ all data or _abort_ the input stream. Just closing the input stream is
not good enough in AWS SDK perspective, which gives a warning if the input
stream was not fully read or not aborted explicitly. See
{{S3AbortableInputStream#close()}} method. \[1\]
Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String,
InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could
be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does:
Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the
toolkit needs to _abort_ the input stream if it doesn't want to read data
fully. \[2\]
\[1\]
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
> Avoid S3 Incomplete Read Warning
> --------------------------------
>
> Key: JCR-4369
> URL: https://issues.apache.org/jira/browse/JCR-4369
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: jackrabbit-aws-ext
> Affects Versions: 2.16.3, 2.17.5
> Reporter: Woonsan Ko
> Priority: Minor
>
> While using S3DataStore, the following logs are observed occasionally:
> {noformat}
> WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178]
> Not all bytes were read from the S3ObjectInputStream,
> aborting HTTP connection. This is likely an error and may result in
> sub-optimal behavior. Request only the bytes you need via a ranged
> GET or drain the input stream after use.
> {noformat}
> The warning logs are being left not only by HTTP processing threads, but also
> by background threads, which made me think of the possibility of some
> 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http
> connection by client.
> By the way, this issue is not a major one as AWS toolkit seems to just give a
> warning as _recommendation_ in that case, with closing the underlying
> HttpRequest object properly. So, there's no issue in functionality for the
> record.
> After looking at the code, I noticed that
> {{CachingDataStore#proactiveCaching}} is enabled by default, which means the
> {{S3DataStore}} tries to _proactively_ download the binary content,
> asynchronously in a new thread, even when accessing metadata through
> {{#getLastModified(...) and #getLength(...).
> Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content
> (in other words get an input stream on an {{S3Object}}, it is recommended to
> _read_ all data or _abort_ the input stream. Just closing the input stream is
> not good enough in AWS SDK perspective, which gives a warning if the input
> stream was not fully read or not aborted explicitly. See
> {{S3AbortableInputStream#close()}} method. \[1\]
> Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String,
> InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}})
> could be improved like the following:
> - If local cache file doesn't exist or it's on purge mode, it works as it
> does: Just copy everything to local cache file and close it.
> - Otherwise, it should {{abort}} the input stream.
> The issue is a known one in AWS toolkit. \[2\] It seems like clients using
> the toolkit needs to _abort_ the input stream if it doesn't want to read data
> fully. \[2\]
> \[1\]
> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
> \[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)