Woonsan Ko created JCR-4369:
-------------------------------

             Summary: Avoid S3 Incomplete Read Warning
                 Key: JCR-4369
                 URL: https://issues.apache.org/jira/browse/JCR-4369
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-aws-ext
            Reporter: Woonsan Ko


While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] 
Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. 
This is likely an error and may result in sub-optimal behavior. Request only 
the bytes you need via a ranged GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but 
also by background threads, which made me think of the possibility of some 
'issues' in {{S3Backend}} implementation. Not just caused by a broken http 
connection by client.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} 
is enabled by default, which means the {{S3DataStore}} tries to _proactively_ 
download the binary content even when accessing metadata through 
{{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content 
(in other words get an input stream on an {{S3Object}}, it is recommended to 
_read_ all data or _abort_ the input stream. Just closing the input stream is 
not good enough in AWS SDK perspective, which gives a warning if the input 
stream was not fully read or not aborted explicitly. See 
{{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, 
InputStream)}} (used by  {{CachingDataStore#getStream(DataIdentifier)}}) could 
be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: 
Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

\[1\] 
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to