[jira] [Commented] (CAMEL-11697) S3 Consumer: If maxMessagesPerPoll is greater than 50 consumer fails to poll objects from bucket

MykhailoVlakh (JIRA) Wed, 23 Aug 2017 06:44:29 -0700

    [ 
https://issues.apache.org/jira/browse/CAMEL-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138359#comment-16138359
 ]


MykhailoVlakh commented on CAMEL-11697:
---------------------------------------

Hello [~ancosen], thank you for a quick response. 
I am not sure if this is the best way to fix this issue since user of the s3 
consumer in this case should know how many connections the consumer needs. And 
since it is not obvious that all s3 objects will be opened simultaneously user 
will most likely ignore that setting until he/she gets a fault in run time.
I think the consumer should calculate a default value for this setting based on 
the value of the maxMessagesPerPoll  property to make sure that it always has 
enough connections unless user decided to use custom value. Do you agree?

Also I am looking at the s3 consumer code and I do not understand why the 
consumer opens all the objects right away. Why cannot it open one object at a 
time when it actually initiates an exchange for it? This seems more efficient 
and requires only 1 connection at a time. What do you think?

> S3 Consumer: If maxMessagesPerPoll is greater than 50 consumer fails to poll 
> objects from bucket
> ------------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-11697
>                 URL: https://issues.apache.org/jira/browse/CAMEL-11697
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-aws
>    Affects Versions: 2.14.3, 2.19.2
>            Reporter: MykhailoVlakh
>            Assignee: Andrea Cosentino
>             Fix For: 2.20.0
>
>
> It is possible to configure S3 consumer to process several s3 objects in a 
> single poll using the maxMessagesPerPoll property. 
> If this property contains a small number, less than 50, everything works fine 
> but if user tries to consume more files then s3 consumer simply fails all the 
> time. It cannot poll files because there are not enough HTTP connections to 
> open streams for all the requested files at once. The exception looks like 
> this:
> {code}
> com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool
>       at 
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:544)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:273)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3660)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1133)
>       at 
> com.amazonaws.services.s3.AmazonS3EncryptionClient.access$201(AmazonS3EncryptionClient.java:65)
>       at 
> com.amazonaws.services.s3.AmazonS3EncryptionClient$S3DirectImpl.getObject(AmazonS3EncryptionClient.java:524)
>       at 
> com.amazonaws.services.s3.internal.crypto.S3CryptoModuleAE.getObjectSecurely(S3CryptoModuleAE.java:106)
>       at 
> com.amazonaws.services.s3.internal.crypto.CryptoModuleDispatcher.getObjectSecurely(CryptoModuleDispatcher.java:114)
>       at 
> com.amazonaws.services.s3.AmazonS3EncryptionClient.getObject(AmazonS3EncryptionClient.java:427)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1005)
>       at 
> org.apache.camel.component.aws.s3.S3Consumer.createExchanges(S3Consumer.java:112)
>       at org.apache.camel.component.aws.s3.S3Consumer.poll(S3Consumer.java:93)
>       at 
> org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:187)
>       at 
> org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:114)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue happens because by default AmazonS3Client uses HTTP client with 
> limited number of connections in the pool - 50. 
> Since S3 consumer provides a possibility to consume any number of s3 objects 
> in a single pool and because it is quite common case that someone needs to 
> process 50 or more files in a single pool I think s3 consumer should handle 
> this case properly. It should automatically change HTTP connections pool size 
> to be able to handle requested number of objects. This can be done like this:
> {code}
> ClientConfiguration s3Config = new ClientConfiguration();
> /*
> +20 we need to allocate a bit more to be sure that we always can do 
> additional API calls when we already hold maxMessagesPerPoll s3 object 
> streams opened
> */
> s3Config.setMaxConnections(maxMessagesPerPoll + 20); 
> AmazonS3Client client = new AeAmazonS3Client(awsCreds, s3Config );
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CAMEL-11697) S3 Consumer: If maxMessagesPerPoll is greater than 50 consumer fails to poll objects from bucket

Reply via email to