In my situation, when I use mt_dop=24, I encountered the problem that some exchange operator take much time processing data. There are much data in the defered queue waiting for being deserialized, why no data be dequeue from the queue?
To speed up deserialized from defered queue, I change the config; datastream_service_num_deserialization_threads=80 datastream_service_deserialization_queue_size=50000 And in the same time, cpu utilization is only about 20%. EXCHANGE_NODE (id=17):(Total: 12s519ms, non-child: 7s804ms, % non-child: 62.34%) Node Lifecycle Event Timeline: 20s865ms - Open Started: 1s104ms (1s104ms) - Open Finished: 5s716ms (4s612ms) - First Batch Requested: 5s717ms (107.811us) - First Batch Returned: 5s739ms (22.135ms) - Last Batch Returned: 20s863ms (15s124ms) - Closed: 20s865ms (1.429ms) - ConvertRowBatchTime: 15.023ms - PeakMemoryUsage: 40.31 MB (42271020) - RowsReturned: 1.18M (1183889) - RowsReturnedRate: 94.56 K/sec Buffer pool: - AllocTime: 121.084ms - CumulativeAllocationBytes: 263.20 MB (275980288) - CumulativeAllocations: 8.28K (8276) - PeakReservation: 19.44 MB (20381696) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 19.44 MB (20381696) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - SystemAllocTime: 61.052ms - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: - BytesDequeued (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 155.30 KB, 5.69 MB, 13.08 MB, 20.32 MB, 27.79 MB, 35.03 MB, 42.62 MB, 50.28 MB, 57.71 MB, 65.44 MB, 73.14 MB, 80.80 MB, 88.12 MB, 95.52 MB, 103.14 MB, 110.16 MB, 120.44 MB, 129.98 MB - FirstBatchWaitTime: 4s612ms - TotalBytesDequeued: 145.28 MB (152335476) - TotalGetBatchTime: 12s500ms - DataWaitTime: 4s714ms Enqueue: - BytesReceived (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11.02 MB, 23.38 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 24.85 MB, 25.05 MB, 25.78 MB, 27.08 MB, 30.66 MB, 37.86 MB, 45.58 MB, 50.16 MB, 51.69 MB, 52.16 MB, 52.65 MB, 54.85 MB, 61.37 MB, 69.15 MB, 70.72 MB, 71.36 MB, 71.50 MB, 71.56 MB - DeferredQueueSize (500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 324, 987, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 1.07K, 933, 775, 654, 649, 844, 1.06K, 1.10K, 988, 809, 635, 567, 799, 1.17K, 1.07K, 888, 563, 209 - DispatchTime: (Avg: 52.816us ; Min: 7.746us ; Max: 3.283ms ; Number of samples: 4138) - DeserializeRowBatchTime: 293.881ms - TotalBatchesEnqueued: 4.14K (4138) - TotalBatchesReceived: 4.14K (4138) - TotalBytesReceived: 71.56 MB (75036662) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1.32K (1325) - TotalHasDeferredRPCsTime: 14s833ms - TotalRPCsDeferred: 3.87K (3871) CodeGen:(Total: 1s000ms, non-child: 1s000ms, % non-child: 100.00%) - CodegenInvoluntaryContextSwitches: 45 (45) - CodegenTotalWallClockTime: 1s000ms - CodegenSysTime: 4.000ms - CodegenUserTime: 828.000ms - CodegenVoluntaryContextSwitches: 487 (487) - CompileTime: 252.544ms - IrGenerationTime: 65.293ms - LoadTime: 0.000ns - ModuleBitcodeSize: 2.52 MB (2645424) - NumFunctions: 165 (165) - NumInstructions: 7.29K (7290) - OptimizationTime: 607.741ms - PeakMemoryUsage: 3.56 MB (3732480) - PrepareTime: 69.221ms