Sarjeet Singh created MYRIAD-137:
------------------------------------

             Summary: Resources offered by mesos are blocked with Myriad FWK on 
NullPointerException and FlexDown FGS NM.
                 Key: MYRIAD-137
                 URL: https://issues.apache.org/jira/browse/MYRIAD-137
             Project: Myriad
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: Myriad 0.1.0
            Reporter: Sarjeet Singh


Observed this issue on 2 instances when I did a flex down of FGS NM & On 
another instance, this happened when NullPointerException occurred (JIRA 
Myriad-135).

>From Mesos UI, observed that no resources are left to offer, when there was no 
>utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP).

On debugging RM logs, found the NullPointerException which caused the 
OfferEventHandler thread to exit and no more offers from mesos to myriad after 
that.

Then, I tried restarting RM again, and resources are back to mesos again :)

Then, I tried running few mapreduce jobs and observed the issue with Flexing 
down FGS NM which caused the whole resources offered to myriad to block 
completely and myriad didn't release any resources after that.

So, it seems that Flexing down NMs procedure only cleanup the active containers 
& NM itself, but doesn't clean up outstanding offers incase offers are saved to 
OfferLifeCycle for future task by FGS NMs. 

Resources (From mesos-master UI)
=========

CPUs    Mem
Total    84    253.9 GB
Used    3.300    6.1 GB
Offered    80.700    247.8 GB
Idle    -1.4210854715202004e-14    0 B    <------- No Resources available.

Here is the active Offers (*blocked*) shown on mesos UI for offers:

Offers
=====

ID    Framework    Host    CPUs    Mem
…5050-3270-O4151    MyriadAlpha    node101-116    0.5    64 MB
…5050-3270-O4149    MyriadAlpha    node101-116    0.200    282 MB
…5050-3270-O4147    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4145    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4143    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4141    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4139    MyriadAlpha    node101-117    24.5    87.8 GB
…5050-3270-O4137    MyriadAlpha    node101-116    22.9    87.4 GB
…5050-3270-O4135    MyriadAlpha    node101-117    3    3.0 GB
…5050-3270-O4134    MyriadAlpha    node101-137    25.6    65.2 GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to