Re: Maximize the consumer throughput and CPU usage

Jason Joo Fri, 29 Nov 2019 00:57:34 -0800

Tien,

Glad to hear that.
One more thing 10k problem for single node is quite different from that in a 
cluster. And there will be a big jitter in service when doing deployment or 
there was a failure. 
So It's better to split consumers into small nodes and adjust / split providers 
into groups if it's necessary. 
Keep lightweight and simple.


best regards,

Jason

> On Nov 28, 2019, at 18:16, Tien Dat PHAN <[email protected]> wrote:
> 
> Dear Jason,
> 
> Thanks a lot for your help so far.
> We just found out the issue.
> Our laptop at the consumer side has 2 cores 4 threads. While the Provider is 
> running on VM of 48 cores. So we thought that from consumer side, running 
> with 4 threads, or maximum 8 would be enough to consume the maximum capacity 
> of our laptop. 
> But we were wrong. When we increase the consuming process from 8 threads to 
> 16, 32 and 48 threads, we increase the performance from 40K TPS to 70K TPS, 
> 98K TPS and 100K TPS, respectively. So we guess that, at the consuming side, 
> the performance is maximized if the parallelism reaching the maximum 
> parallelism at the provider side.
> 
> This put us to a question that if the consumer sends requests to a cluster of 
> providers, the number of threads might need to be increased to a very high 
> number.
> This still need to be benched to make sure.
> 
> That is our information so far. Hope this one can be helpful also.
> Thank you also for your help.
> 
> Best regards
> Dat
> 
> On 2019/11/27 16:24:18, Jason Joo <[email protected]> wrote: 
>> Tien,
>> 
>> Or maybe you can use single consumer but set the target busy service's 
>> "connections" in configuration to "2" or bigger value.
>> 
>> If you integrated dubbo using XML:
>> http://dubbo.apache.org/en-us/docs/user/references/xml/dubbo-service.html 
>> <http://dubbo.apache.org/en-us/docs/user/references/xml/dubbo-service.html>
>> 
>> Or if you integrated dubbo using annotations:
>> @Reference(connections = 2)
>> 
>> And so on. Just try this for another try.
>> 
>> 
>> best regards,
>> 
>> Jason
>> 
>>> On Nov 27, 2019, at 22:55, Tien Dat PHAN <[email protected]> wrote:
>>> 
>>> Dear Jason,
>>> 
>>> You were right. The fact that when we tried to run two processes of 
>>> consumer, we increased the total throughput by almost 2X. This means that 
>>> the infrastructure and OS (the laptop and Mac OS in this case) is capable 
>>> of providing higher throughput.
>>> 
>>> So the suspect comes back to the configuration of the consumer process.
>>> 
>>> We have tried the Arthas tool that you suggested. A very useful tool in 
>>> general. However, it still does not show any interesting lead that can 
>>> explain the problem.
>>> 
>>> We are stuck for now :(
>>> 
>>> Best 
>>> Dat
>>> 
>>> On 2019/11/27 10:38:32, Jason Joo <[email protected]> wrote: 
>>>> Tien,
>>>> 
>>>> Performance issue is quite a complicated problem and the open files count 
>>>> you just mentioned in last mail is not the only factor.
>>>> According to the persistent connection in dubbo design I don't think is 
>>>> the cause due to your issue and maybe the value you saw in your work 
>>>> machine is not the same on your servers in production. And if you hit the 
>>>> limit some error logs will occur in your log files.
>>>> 
>>>> You mentioned that only one core can reach into a high load work status I 
>>>> think you should probably check the IO workers count in the configuration. 
>>>> And an efficient debugging tool which could work here is Arthas[1] you can 
>>>> dig into the JVM process and check which thread is spinning to 100% cpu 
>>>> time.
>>>> 
>>>> Certainly for further performance optimization you should also adjust the 
>>>> parameters of JVM after you SOLVING this.
>>>> 
>>>> 
>>>> [1] https://github.com/alibaba/arthas
>>>> 
>>>> best regards,
>>>> 
>>>> Jason
>>>> 
>>>>> On Nov 27, 2019, at 18:25, Tien Dat PHAN <[email protected]> wrote:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> We just found out that the ulimit on the consumer side is relatively low 
>>>>> (ulimit -n returns 256, we are testing it on MAC OS). We suspect this one 
>>>>> may be the root cause of the low throughput that we have observed. Do any 
>>>>> of you have experience with having performance degradation due to system 
>>>>> limit? 
>>>>> 
>>>>> Best regards
>>>>> Tien Dat
>>>>> 
>>>>> On 2019/11/19 10:56:27, Tien Dat PHAN <[email protected]> wrote: 
>>>>>> Dear experts,
>>>>>> 
>>>>>> Could any of you shed some light on this topic?
>>>>>> 
>>>>>> Best regards
>>>>>> Tien Dat PHAN
>>>>>> 
>>>>>> On 2019/11/14 15:07:09, Tien Dat PHAN <[email protected]> wrote: 
>>>>>>> Dear experts,
>>>>>>> 
>>>>>>> We are having a powerful server (48 cores, four disks and 512 GB RAM) 
>>>>>>> providing the services.
>>>>>>> From my laptop (2 cores - 4 threads), we start to consume the public 
>>>>>>> services.
>>>>>>> In order to bench the throughput of our public service provider, we 
>>>>>>> just simply consume a very basic method (an Object receiving method, 
>>>>>>> sending object from consumer and provider receives it, no storage or 
>>>>>>> cache is invoked after that). The Object size is only 1K.
>>>>>>> 
>>>>>>> We use Dubbo protocol for our service provider.
>>>>>>> We noticed that: the consumer does not seem to consume to the second 
>>>>>>> core of my laptop (it only consumes one full core). This results in a 
>>>>>>> throughput of 4000 requests per second.
>>>>>>> To prove that our laptop can reach higher throughput, we run the second 
>>>>>>> process of consuming our public service. With two processes running 
>>>>>>> simultaneously, we can reach the throughput of 7500 request per second, 
>>>>>>> and almost double the CPU usage, in total.
>>>>>>> Our consumer implementation is multiple threads. So we don't think the 
>>>>>>> implementation is the issue. We suspect that some configuration is 
>>>>>>> needed to improve the performance and concurrency. For the above 
>>>>>>> results, we just use the default configuration of Dubbo.
>>>>>>> 
>>>>>>> Do you have any best practice example for improving the performance and 
>>>>>>> concurrency of consumer? If yes, could you please enlighten us?
>>>>>>> 
>>>>>>> Thank you guys in advance.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> Tien Dat PHAN
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Maximize the consumer throughput and CPU usage

Reply via email to