On 09/05/2012 06:09 PM, Iustin Pop wrote:
On Wed, Sep 05, 2012 at 12:54:30PM +0300, Constantinos Venetsanopoulos wrote:
Hello iustin,
any news on that? I assume you are quite busy these days..
Not so much busy as overloaded with many small things, so I forgot about
this.
Looking now at the LOCAL.data you provided me, I see that running hinfo
on it shows the known issue with KVM memory reporting:
Can you point me to the corresponding thread for this problem,
because I think I have missed that..
Cluster status:
F Name t_mem n_mem i_mem x_mem f_mem r_mem t_dsk f_dsk pcpu
vcpu
- demo1.dev.grnet.gr 0 0 0 0 0 0 0 0 0
0
- demo2.dev.grnet.gr 0 0 0 0 0 0 0 0 0
0
- demo3.dev.grnet.gr 0 0 0 0 0 0 0 0 0
0
- demo4.dev.grnet.gr 0 0 0 0 0 0 0 0 0
0
demo5.dev.grnet.gr 193811 429 0 310 193072 0 3800 3800 24
0
demo6.dev.grnet.gr 193811 840 0 -97 193068 1024 3800 3780 24
0
demo7.dev.grnet.gr 193811 836 0 -95 193070 0 3800 3800 24
0
demo8.dev.grnet.gr 193811 381 0 395 193035 0 3800 3800 24
0
demo9.dev.grnet.gr 193811 1630 1024 -830 191987 0 3800 3780 24
1
demo10.dev.grnet.gr 193811 1257 1024 -1100 192630 0 3800 3780 24
1
demo11.dev.grnet.gr 193811 8014 22528 -24579 187848 1024 3800 3640 24
22
demo12.dev.grnet.gr 193811 364 0 379 193068 1024 3800 3780 24
0
demo13.dev.grnet.gr 193811 1020 1024 -878 192645 0 3800 3780 24
1
It could be that having negative x_mem throws the statistics badly
off-track, but I'm not entirely sure.
If this is the case, wouldn't it affect also the drbd instances too?
Oh oh, I think I know. This is debug output from an instrumented binary.
I'm using 'plain', by the way:
"For new-0 new primary demo5.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo6.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo7.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo8.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo9.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo10.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo11.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo12.dev.grnet.gr, score: 11.598841466014958"
"For new-0 new primary demo13.dev.grnet.gr, score: 11.598841466014958"
Note that all scores were identical, and we took the last one
arbitrarily.
"For new-1 new primary demo5.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo6.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo7.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo8.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo9.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo10.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo11.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo12.dev.grnet.gr, score: 11.603271353529022"
"For new-1 new primary demo13.dev.grnet.gr, score: 11.603271353529022"
Again the same situation, and again and again:
"For new-11 new primary demo5.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo6.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo7.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo8.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo9.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo10.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo11.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo12.dev.grnet.gr, score: 17.804894914109706"
"For new-11 new primary demo13.dev.grnet.gr, score: 17.804894914109706"
So what is happening here is that the cluster is reasonably balanced,
and adding an instance results in the same score, anywhere we place it.
That's expected for instance "new-0". But after new-0, it shouldn't be
the same, so I suspect a bug in allocation of instances without
secondaries… but I don't see anything obvious when looking at the code.
If I understand correctly, you feed hail with the cluster info and
simulate the creation of an instance, so it can provide output?
If yes, when you add "new-0":
- is the cluster empty of instances? or
- it already has the instances found in my LOCAL.data?
In case of the first, the output seems reasonable when adding new-0.
In case of the latter, shouldn't the scores be different also for new-0?
In either case, after the addition of new-0 the score should be
different indeed. So it seems we have a problem there.
I will investigate further, but it could a number of things:
- a bug in the update of node list, which would make the most sense (but
I don't see one)
- accumulated rounding errors, but the disk/memory values are sane
- something else
I have to say, this is mighty interesting :)
More investigation it is then...
OK. I'll try to dive into a bit of Haskell too, but I'm not
making any promises yet :)
thanks a lot for investigating this,
Constantinos