Chaitanya created APEXMALHAR-2174:
-------------------------------------

             Summary: S3 File Reader reading more data than expected
                 Key: APEXMALHAR-2174
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2174
             Project: Apache Apex Malhar
          Issue Type: Bug
            Reporter: Chaitanya
            Assignee: Chaitanya


This is observed through the AWS billing.
Issue might be the S3InputStream.read() which is used in readEntity().

Reading the block can be achieved through the AmazonS3 api's. So, I am 
proposing the following solution:
```
      GetObjectRequest rangeObjectRequest = new GetObjectRequest(
          bucketName, key);
      rangeObjectRequest.setRange(startByte, noOfBytes);
      S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
      S3ObjectInputStream wrappedStream = objectPortion.getObjectContent();
      byte[] record = ByteStreams.toByteArray(wrappedStream);

Advantages of this solution: Parallel read will work for all types of s3 file 
systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to