[grpc-io] Re: Scala gRPC High CPU Utilization

'Kos' via grpc.io Mon, 10 Sep 2018 22:04:49 -0700

Hey Carl,

Yup the way you've described it is exactly how I have it setup.


private def channelBuilder(address: String): ManagedChannel = {
  val interceptor = 
MonitoringClientInterceptor.create(Configuration.allMetrics())
  val builder = NettyChannelBuilder
    .forAddress(address, 
realtimePublishProxyConfig.realtimeInstanceConfig.realtimeServicePort)
    .intercept(interceptor)
    .executor(ForkJoinPool.commonPool())
    .eventLoopGroup(nioEventLoopGroup)

  clientSslContext match {
    case Some(sslContext) => 
builder.sslContext(sslContext).useTransportSecurity()
    case None => builder.usePlaintext(true)
  }

  builder.build()
}



I just got back from PTO so going to start looking into this again. I had a 
quick question that I couldn't find much information about after doing some 
searching. Currently we send one logical request per RPC. Would there be 
advantages to sending multiple requests in one RPC (i.e batching)? Our 
server would likely convert the batch RPC into individual requests so we 
wouldn't be able to take advantage of batching throughout the system. So my 
question is more targeted towards gRPC and how it handles this. 

Thanks!    

On Friday, August 31, 2018 at 4:06:18 PM UTC-7, Carl Mastrangelo wrote:
>
> It should be one ManagedChannel per host*.  There are a lot of batching 
> opportunities possible when using just one channel.  For example, TLS 
> encryption can work on larger block sizes at a time.  Another example is 
> that netty can poll on fewer threads, meaning fewer wakeups across all your 
> threads.  Just to make sure we are on the same page:
>
> - Only use one ManagedChannel per "target"  (a.k.a. hostname)
> - Use a limited executor (like ForkJoinPool) for each channel.  You can 
> share this executor across channels if your RPCs don't block very long
> - Use a single Netty EventLoopGroup, and limit the number of loops.  You 
> can share the group across all your channels.
> - Use Netty tcnative for SSL / TLS if you aren't already.  This is 
> enormously faster than the SSL that comes with the JDK.
>
> These also apply the server as well.
>
>
> * there are rare cases to break this rule, but they don't sound like they 
> apply to your usage.
>
> On Friday, August 31, 2018 at 2:02:18 PM UTC-7, Kos wrote:
>>
>> Hi Carl,
>>
>> I did run a Yourkit run against my service and what I see is many threads 
>> being created for the event loop group - they're all named something like: 
>> 'grpc-default-worker-ELG-...'. I did some reading on your other posts and 
>> saw you recommended using an ELG bounded to 1-2 threads. I tried this and I 
>> see our CPU utilization drop by about 10-15% with no loss in throughput! 
>>
>> This got me thinking and I'm wondering if my problem is actually having 
>> too many managed channels? Some background, this service creates a managed 
>> channel per each node that it talks to (about 40 or so). For the traffic it 
>> receives, it does some filtering and sends the data down N of those 
>> channels. This outbound throughput is quite low - on the order of 10k/sec 
>> across all the channels. All of these managed channels now share the same 
>> NIO ELG and they use the same ForkJoinPool.commonPool as you recommend. I'm 
>> wondering though if one managed channel per host is the correct approach 
>> here? Or should I make my own 'ManagedChannel' and create a subchannel per 
>> host?
>>
>> Thanks!
>>
>>
>>
>>
>>
>> On Wednesday, August 29, 2018 at 2:28:00 PM UTC-7, Carl Mastrangelo wrote:
>>>
>>> More info is needed to figure out why this is slow.  Have you use 
>>> JProfiler or Yourkit before?  There are a couple Java profilers (perf, 
>>> even) that can tell you where the CPU is going to.  Also, you should 
>>> consider turning on gc logging to see if memory is being consumed too 
>>> fast.  
>>>
>>> Our tuned benchmarks get about 5000qps per core, but that took profiling 
>>> before we could get that fast.   The general approach is figure out whats 
>>> slow, and then fix that.   Without knowing whats slow for your test, its 
>>> hard to recommend a fix.
>>>
>>> On Tuesday, August 28, 2018 at 2:14:31 PM UTC-7, kaus...@box.com wrote:
>>>>
>>>> Hi Carl,
>>>>
>>>> Thanks for responding! I've tried a couple different executors and they 
>>>> don't seem to change the behavior. I've done FixedThreadPool with the 
>>>> number of threads = # of cores * 2, the ForkJoinPool.commonPool as you 
>>>> recommended, and the Scala global ExecutionContext which ultimately is a 
>>>> ForkJoinPool as well. I've set this in the NettyServerBuilder as well as 
>>>> the call to bind my service.  
>>>>
>>>> For some more information, here's results from a gatling test run that 
>>>> lasted 10 minutes using the CommonPool. Server implementation now looks 
>>>> like this:
>>>>
>>>> val realtimeServiceWithMonitoring =
>>>>   ServerInterceptors.intercept(
>>>>     RealtimePublishGrpc.bindService(realtimeService, 
>>>> ExecutionContext.global),
>>>>     serverInterceptor)
>>>> val rppServiceWithMonitoring = ServerInterceptors.intercept(
>>>>   RealtimeProxyGrpc.bindService(realtimePublishProxyService, 
>>>> ExecutionContext.global),
>>>>   serverInterceptor
>>>> )
>>>>
>>>>
>>>> NettyServerBuilder
>>>>   .forPort(*8086*)
>>>>   .sslContext(serverGrpcSslContexts)
>>>>   .addService(realtimeServiceWithMonitoring)
>>>>   .addService(batchPublishWithMonitoring)
>>>>   .addService(rppServiceWithMonitoring)
>>>>   .executor(ForkJoinPool.commonPool())
>>>>   .build()
>>>>
>>>>
>>>> My service implementation immediately returns Future.successful:
>>>>
>>>> override def publish(request: PublishRequest): Future[PublishResponse] = {
>>>>   logger.debug("Received Publish request: " + request)
>>>>   Future.successful(PublishResponse())
>>>>
>>>> }
>>>>
>>>>
>>>> Test Results:
>>>>
>>>> ================================================================================
>>>> ---- Global Information 
>>>> --------------------------------------------------------
>>>> > request count                                     208686 (OK=208686 
>>>> KO=0     )
>>>> > min response time                                    165 (OK=165    
>>>> KO=-     )
>>>> > max response time                                   2997 (OK=2997  
>>>>  KO=-     )
>>>> > mean response time                                   287 (OK=287    
>>>> KO=-     )
>>>> > std deviation                                        145 (OK=145    
>>>> KO=-     )
>>>> > response time 50th percentile                        232 (OK=232    
>>>> KO=-     )
>>>> > response time 75th percentile                        324 (OK=324    
>>>> KO=-     )
>>>> > response time 95th percentile                        501 (OK=501    
>>>> KO=-     )
>>>> > response time 99th percentile                        894 (OK=893    
>>>> KO=-     )
>>>> > mean requests/sec                                347.231 (OK=347.231 
>>>> KO=-     )
>>>> ---- Response Time Distribution 
>>>> ------------------------------------------------
>>>> > t < 800 ms                                        206014 ( 99%)
>>>> > 800 ms < t < 1200 ms                                1511 (  1%)
>>>> > t > 1200 ms                                         1161 (  1%)
>>>> > failed                                                 0 (  0%)
>>>>
>>>> ================================================================================
>>>>
>>>> 347 Requests/Sec. CPU Utilization hovers between 29% - 35%. 
>>>>
>>>>
>>>> Thanks for your help!
>>>>
>>>>
>>>>
>>>> On Tuesday, August 28, 2018 at 12:49:38 PM UTC-7, Carl Mastrangelo 
>>>> wrote:
>>>>>
>>>>> Can you try setting  the executor on both the channel and the server 
>>>>> builder?   I would recommend ForkJoinPool.commonPool().
>>>>>
>>>>> On Monday, August 27, 2018 at 11:54:19 PM UTC-7, Kos wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using gRPC in a new Scala service and I'm seeing unexpectedly 
>>>>>> high CPU utilization. I see this high utilization in our production 
>>>>>> workload but also am able to reproduce via performance tests which I'll 
>>>>>> describe below. 
>>>>>>
>>>>>> My setup is using grpc-netty-shaded 1.10 (but i've also repro'd with 
>>>>>> 1.14). My performance test uses mTLS to talk to the service. The service 
>>>>>> is 
>>>>>> deployed on a container with 6 cores and 2 gb ram. I've reduced the 
>>>>>> footprint of my service to immediately return with a response without 
>>>>>> doing 
>>>>>> any other work to try and identify if it's the application or something 
>>>>>> to 
>>>>>> do with my gRPC configuration.
>>>>>>
>>>>>> My performance test is issuing about 250 requests a second using one 
>>>>>> Managed Channel to one instance of my service. The data in each request 
>>>>>> is 
>>>>>> about 10 bytes. With this workload, my service is running at about 35% 
>>>>>> CPU, 
>>>>>> which I feel is far too high for this small amount of rps.
>>>>>>
>>>>>> Here is how I've constructed my server:
>>>>>>
>>>>>> val serverInterceptor = 
>>>>>> MonitoringServerInterceptor.create(Configuration.allMetrics())
>>>>>>
>>>>>>
>>>>>> val realtimeServiceWithMonitoring = ServerInterceptors.intercept(
>>>>>>   RealtimePublishGrpc.bindService(realtimeService, 
>>>>>> ExecutionContext.global),
>>>>>>   serverInterceptor)
>>>>>> val rppServiceWithMonitoring = ServerInterceptors.intercept(
>>>>>>   RealtimeProxyGrpc.bindService(realtimePublishProxyService, 
>>>>>> ExecutionContext.global),
>>>>>>   serverInterceptor
>>>>>> )
>>>>>>
>>>>>>
>>>>>>   val keyManagerFactory = GrpcSSLHelper.getKeyManagerFactory
>>>>>> (sslConfig)
>>>>>>   val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory
>>>>>> (sslConfig)
>>>>>>   val serverGrpcSslContexts = GrpcSSLHelper.getServerSslContext
>>>>>> (keyManagerFactory, trustManagerFactory)
>>>>>>
>>>>>>   NettyServerBuilder
>>>>>>     .forPort(8086)
>>>>>>     .sslContext(serverGrpcSslContexts)
>>>>>>     .addService(realtimeServiceWithMonitoring)
>>>>>>     .addService(rppServiceWithMonitoring)
>>>>>>     .build()
>>>>>> }
>>>>>>
>>>>>>
>>>>>> The server interceptor is modeled after: 
>>>>>> https://github.com/grpc-ecosystem/java-grpc-prometheus
>>>>>>
>>>>>> The managed channel is constructed as such:
>>>>>>
>>>>>> private val interceptor = 
>>>>>> MonitoringClientInterceptor.create(Configuration.allMetrics())
>>>>>>
>>>>>>
>>>>>> val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory(sslConfig)
>>>>>>
>>>>>> NettyChannelBuilder
>>>>>>   .forAddress(address, *8086*)
>>>>>>   .intercept(interceptor)
>>>>>>   .negotiationType(NegotiationType.TLS)
>>>>>>   .sslContext(GrpcSSLHelper.getClientSslContext(keyManagerFactory, 
>>>>>> trustManagerFactory))
>>>>>>   .build()
>>>>>>
>>>>>>
>>>>>> Finally, I use non-blocking stubs to issue the gRPC request.
>>>>>>
>>>>>> Any help would be greatly appreciated. Thanks!
>>>>>> -K
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/8e8eeedf-40d2-4ef6-81f6-55b7435cbe95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: Scala gRPC High CPU Utilization

Reply via email to