[ https://issues.apache.org/jira/browse/MYRIAD-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Santosh Marella resolved MYRIAD-137. ------------------------------------ Resolution: Fixed https://github.com/apache/incubator-myriad/commit/d2eaa4d163b79bd233614fe88e0a341d2fcdb126 > Resources offered by mesos are blocked with Myriad FWK on > NullPointerException and FlexDown FGS NM. > --------------------------------------------------------------------------------------------------- > > Key: MYRIAD-137 > URL: https://issues.apache.org/jira/browse/MYRIAD-137 > Project: Myriad > Issue Type: Bug > Components: Scheduler > Reporter: Sarjeet Singh > Assignee: Santosh Marella > Fix For: Myriad 0.1.0 > > > Observed this issue on 2 instances when I did a flex down of FGS NM & On > another instance, this happened when NullPointerException occurred (JIRA > Myriad-135). > From Mesos UI, observed that no resources are left to offer, when there was > no utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP). > On debugging RM logs, found the NullPointerException which caused the > OfferEventHandler thread to exit and no more offers from mesos to myriad > after that. > Then, I tried restarting RM again, and resources are back to mesos again :) > Then, I tried running few mapreduce jobs and observed the issue with Flexing > down FGS NM which caused the whole resources offered to myriad to block > completely and myriad didn't release any resources after that. > So, it seems that Flexing down NMs procedure only cleanup the active > containers & NM itself, but doesn't clean up outstanding offers incase offers > are saved to OfferLifeCycle for future task by FGS NMs. > Resources (From mesos-master UI) > ========= > CPUs Mem > Total 84 253.9 GB > Used 3.300 6.1 GB > Offered 80.700 247.8 GB > Idle -1.4210854715202004e-14 0 B <------- No Resources available. > Here is the active Offers (*blocked*) shown on mesos UI for offers: > Offers > ===== > ID Framework Host CPUs Mem > …5050-3270-O4151 MyriadAlpha node101-116 0.5 64 MB > …5050-3270-O4149 MyriadAlpha node101-116 0.200 282 MB > …5050-3270-O4147 MyriadAlpha node101-116 1 1.0 GB > …5050-3270-O4145 MyriadAlpha node101-116 1 1.0 GB > …5050-3270-O4143 MyriadAlpha node101-116 1 1.0 GB > …5050-3270-O4141 MyriadAlpha node101-116 1 1.0 GB > …5050-3270-O4139 MyriadAlpha node101-117 24.5 87.8 GB > …5050-3270-O4137 MyriadAlpha node101-116 22.9 87.4 GB > …5050-3270-O4135 MyriadAlpha node101-117 3 3.0 GB > …5050-3270-O4134 MyriadAlpha node101-137 25.6 65.2 GB -- This message was sent by Atlassian JIRA (v6.3.4#6332)