[jira] [Commented] (IMPALA-9919) Bad Impala Performance after a period of time

Tim Armstrong (Jira) Thu, 09 Jul 2020 13:18:22 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154902#comment-17154902
 ]


Tim Armstrong commented on IMPALA-9919:
---------------------------------------

It was a little tricky to compare since there was a plan change between the two 
queries, but I spot checked a couple of operators where the time in ExecSummary 
was very different but the row count was similar.

The time spent is not actually really accounted anywhere obvious, but it *is* 
time that the thread spent in the SORT operator, doing something or other. The 
fact that it's not user or system time means that the thread was blocked doing 
something. It's notable that the CPU-intensive InMemorySort takes about the 
same time. 

Anyway, based on experience this is almost certainly memory allocation - 
contention on some internal locks in TCMalloc. This is largely resolved by the 
JIRAs I referenced. There isn't really a straightforward workaround - we needed 
architectural changes to better handle complex queries like this on larger 
clusters. It would probably also speed up the queries significantly even 
immediately after a restart.

{noformat}
      - TotalThreadsInvoluntaryContextSwitches: 766 (766)
      - TotalThreadsTotalWallClockTime: 40.7m (2442515655405)
        - TotalThreadsSysTime: 1.30s (1303947604)
        - TotalThreadsUserTime: 28.05s (28054839187)
      - TotalThreadsVoluntaryContextSwitches: 29,022 (29022)
..
          SORT_NODE (id=47)
            - InMemorySortTime: 18.86s (18865775034)
            - InactiveTotalTime: 0ns (0)
            - InitialRunsCreated: 1 (1)
            - PeakMemoryUsage: 932.6 MiB (977910822)
            - RowsReturned: 11,591,696 (11591696)
            - RowsReturnedRate: 4799 per second (4799)
            - SortDataSize: 928.6 MiB (973702511)
            - SpilledRuns: 0 (0)                                                
                        
            - TotalMergesPerformed: 0 (0)
            - TotalTime: 40.2m (2414786419881)
            Buffer pool
              - AllocTime: 1.19s (1187293401)
              - CumulativeAllocationBytes: 932.5 MiB (977797120)
              - CumulativeAllocations: 466 (466)
              - InactiveTotalTime: 0ns (0)
              - PeakReservation: 932.5 MiB (977797120)
              - PeakUnpinnedBytes: 0 B (0)
              - PeakUsedReservation: 932.5 MiB (977797120)
              - ReadIoBytes: 0 B (0)
              - ReadIoOps: 0 (0)
              - ReadIoWaitTime: 0ns (0)
              - TotalTime: 0ns (0)
              - WriteIoBytes: 0 B (0)
              - WriteIoOps: 0 (0)
              - WriteIoWaitTime: 0ns (0)
{noformat}
{noformat}
      - TotalThreadsInvoluntaryContextSwitches: 651 (651)
      - TotalThreadsTotalWallClockTime: 4.1m (244059420184)
        - TotalThreadsSysTime: 864ms (864722666)
        - TotalThreadsUserTime: 28.47s (28474046354)
      - TotalThreadsVoluntaryContextSwitches: 26,537 (26537)
      - TotalTime: 4.1m (244499977327)
...          
   SORT_NODE (id=47)
            - InMemorySortTime: 20.03s (20032893244)
            - InactiveTotalTime: 0ns (0)
            - InitialRunsCreated: 1 (1)
            - PeakMemoryUsage: 954.4 MiB (1000803173)
            - RowsReturned: 11,861,343 (11861343)
            - RowsReturnedRate: 50154 per second (50154)
            - SortDataSize: 950.2 MiB (996352862)
            - SpilledRuns: 0 (0)
            - TotalMergesPerformed: 0 (0)
            - TotalTime: 3.9m (236522258387)
            Buffer pool
              - AllocTime: 162ms (162350359)
              - CumulativeAllocationBytes: 954.3 MiB (1000691029)
              - CumulativeAllocations: 477 (477)
              - InactiveTotalTime: 0ns (0)
              - PeakReservation: 954.3 MiB (1000691029)
              - PeakUnpinnedBytes: 0 B (0)
              - PeakUsedReservation: 954.3 MiB (1000691029)
              - ReadIoBytes: 0 B (0)
              - ReadIoOps: 0 (0)
              - ReadIoWaitTime: 0ns (0)
              - TotalTime: 0ns (0)
              - WriteIoBytes: 0 B (0)
              - WriteIoOps: 0 (0)
              - WriteIoWaitTime: 0ns (0)
{noformat}

> Bad Impala Performance after a period of time
> ---------------------------------------------
>
>                 Key: IMPALA-9919
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9919
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>         Environment: OS: CentOS 6.9
>            Reporter: Vagelis Nomikos
>            Priority: Major
>              Labels: performance
>         Attachments: profiles.zip
>
>
> Our cluster is consisting of about 60 Impala nodes. After a period of time 
> and after executing some "heavy" queries the performance of the cluster 
> becomes bad and the Impala is not responding after a period of time. We 
> observed that day after day the Impala Resident memory and the running 
> threads of the machine keep growing even if we do not run queries. Everytime 
> we perform an Impala restart everything seems to work fine for a period of 
> time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9919) Bad Impala Performance after a period of time

Reply via email to