Chaitanya created APEXMALHAR-2174:
-------------------------------------
Summary: S3 File Reader reading more data than expected
Key: APEXMALHAR-2174
URL: https://issues.apache.org/jira/browse/APEXMALHAR-2174
Project: Apache Apex Malhar
Issue Type: Bug
Reporter: Chaitanya
Assignee: Chaitanya
This is observed through the AWS billing.
Issue might be the S3InputStream.read() which is used in readEntity().
Reading the block can be achieved through the AmazonS3 api's. So, I am
proposing the following solution:
```
GetObjectRequest rangeObjectRequest = new GetObjectRequest(
bucketName, key);
rangeObjectRequest.setRange(startByte, noOfBytes);
S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
S3ObjectInputStream wrappedStream = objectPortion.getObjectContent();
byte[] record = ByteStreams.toByteArray(wrappedStream);
Advantages of this solution: Parallel read will work for all types of s3 file
systems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)