[
https://issues.apache.org/jira/browse/MYRIAD-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Santosh Marella reassigned MYRIAD-137:
--------------------------------------
Assignee: Santosh Marella
> Resources offered by mesos are blocked with Myriad FWK on
> NullPointerException and FlexDown FGS NM.
> ---------------------------------------------------------------------------------------------------
>
> Key: MYRIAD-137
> URL: https://issues.apache.org/jira/browse/MYRIAD-137
> Project: Myriad
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: Myriad 0.1.0
> Reporter: Sarjeet Singh
> Assignee: Santosh Marella
>
> Observed this issue on 2 instances when I did a flex down of FGS NM & On
> another instance, this happened when NullPointerException occurred (JIRA
> Myriad-135).
> From Mesos UI, observed that no resources are left to offer, when there was
> no utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP).
> On debugging RM logs, found the NullPointerException which caused the
> OfferEventHandler thread to exit and no more offers from mesos to myriad
> after that.
> Then, I tried restarting RM again, and resources are back to mesos again :)
> Then, I tried running few mapreduce jobs and observed the issue with Flexing
> down FGS NM which caused the whole resources offered to myriad to block
> completely and myriad didn't release any resources after that.
> So, it seems that Flexing down NMs procedure only cleanup the active
> containers & NM itself, but doesn't clean up outstanding offers incase offers
> are saved to OfferLifeCycle for future task by FGS NMs.
> Resources (From mesos-master UI)
> =========
> CPUs Mem
> Total 84 253.9 GB
> Used 3.300 6.1 GB
> Offered 80.700 247.8 GB
> Idle -1.4210854715202004e-14 0 B <------- No Resources available.
> Here is the active Offers (*blocked*) shown on mesos UI for offers:
> Offers
> =====
> ID Framework Host CPUs Mem
> …5050-3270-O4151 MyriadAlpha node101-116 0.5 64 MB
> …5050-3270-O4149 MyriadAlpha node101-116 0.200 282 MB
> …5050-3270-O4147 MyriadAlpha node101-116 1 1.0 GB
> …5050-3270-O4145 MyriadAlpha node101-116 1 1.0 GB
> …5050-3270-O4143 MyriadAlpha node101-116 1 1.0 GB
> …5050-3270-O4141 MyriadAlpha node101-116 1 1.0 GB
> …5050-3270-O4139 MyriadAlpha node101-117 24.5 87.8 GB
> …5050-3270-O4137 MyriadAlpha node101-116 22.9 87.4 GB
> …5050-3270-O4135 MyriadAlpha node101-117 3 3.0 GB
> …5050-3270-O4134 MyriadAlpha node101-137 25.6 65.2 GB
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)