Re: [code jam] Qualifications 2022 Twisty Little Passages Analysis Importance Sampling

porker2008 Thu, 21 Apr 2022 17:10:57 -0700

@Ricola,

The formula for weighted average is Sum(value * weight) / Sum(weight)


In your case, you have 500 room, with one passage, with weight 1 and you 
have 500 rooms with 99999 passages and with weight 500/99999
So Sum(value * weight) = 500 * 1 + 99999 * 500 / 99999 = 1000 and 
Sum(weight) = 500 * 1 + 500 * 500 / 99999 = 502.500025
So the weighted avg is 1000 / 502.500025 = 1.99004965224
And the estimate of total passages would be 1.99004965224 * 100000 / 2 
= 99502.4826118

在2022年4月20日星期三 UTC-7 10:48:57<Ricola> 写道：

> @erben
> > but the cumulative weight of this (1+1) is (1 + 1/99999). Since this 
> will happen all the time, our new estimation for the average degree will be 
> 2 / (1 + 1/99999) = 2
>
> I think you made a mistake here. This is the average degree for TWO rooms 
> since you have visited 2 rooms (the T one and the W one). So you average 
> degree is still one, which is the same than the simple approach.
>
> Let's say you can visit 1000 rooms. With the simple approach you get 1000 
> passages => av deg = 1=> estimation = 50000 . With the weighted approach 
> you get 500 rooms (W) with one passage and weight 1 and 500 rooms (T) with 
> 99999 passages and weight 1/99999 which gives 500 * 1 + 500 *99999/99999 = 
> 1000. You'll estimate 50000 again.
>
> The problem I am pointing out is that the weight and the value of the T 
> room cancel each other (sample * weight gives B * A/B = A).
>
> Le mercredi 20 avril 2022 à 15:40:09 UTC+1, erben...@gmail.com a écrit :
>
>> @Ricola: I think I figured out what caused the confusion. The example 
>> given in the analysis is unfortunate because it gives the same estimation 
>> for the average degree as the basic approach (without the 'W' steps.) But 
>> it's not always the case!
>>
>> I think it's better to examine what is happening with the "star-graph" 
>> which has one node connected to 99999 others and there are no other edges.
>> It has 99999 edges so we need a guess between 66666 and 133332.
>>
>> When we use the simple approach, with >90% probability we only sample 
>> degree = 1 nodes and we estimate the average degree as 1. So our guess for 
>> the number of edges will be 50000 which is out of the accepted range.
>>
>> Now let's see what happens when we use the T - W method. Let assume again 
>> that all the randomly sampled nodes have degree 1.
>> Now all the W steps move us into the center of the star. As you wrote, we 
>> will use weight 1/99999 for the degree 99999 so we get another 1, but the 
>> cumulative
>> weight of this (1+1) is (1 + 1/99999). Since this will happen all the 
>> time, our new estimation for the average degree will be 2 / (1 + 1/99999) = 
>> 2 - epsilon which will give us
>> a proper estimation for the total number of edges (a little less than 
>> 100000).
>>
>> If our random sample hits the center of the star we will compute a bad 
>> estimation, but in the contest problem we had 8000 steps so we could 
>> randomly sample at most 4% of the big graph and the probability of hitting 
>> the center node was low (less than 10%).
>> Péter Erben a következőt írta (2022. április 6., szerda, 17:11:14 UTC+2):
>>
>>> Hi Joe,
>>>
>>> Thanks for the explanation, though I think it does not explain the issue 
>>> raised by Ricola. Yes, we need to guarantee that "rare" high degrees are 
>>> properly accounted for. I think that part of the official analysis is very 
>>> good.
>>>
>>> What Ricola was asking is about the details of the "importance 
>>> sampling". As Ricola pointed out, the method described in the official 
>>> analysis cannot produce a different outcome than what we get when we only 
>>> use 'T' steps
>>> and estimate the average degree by simply using the average of our 
>>> random sample. And we already agreed on  that only using a random sample is 
>>> not enough when the degree distribution is skewed towards 
>>> low-degree nodes. Maybe the formula for the weights is incorrect?
>>>
>>> /dev/joe a következőt írta (2022. április 6., szerda, 2:27:14 UTC+2):
>>>
>>>> If, like me, you ignored walking and just teleported to a random sample 
>>>> of the rooms, you would have found that you got a perfect score on the 
>>>> cases in the local testing tool, but wrong answer on the real data.
>>>>
>>>> What the analysis is trying to say is that practically the only cases 
>>>> where you will fail are where a very small number of rooms have a very 
>>>> high 
>>>> degree, so few that you are likely to miss them in random sampling. 
>>>> Suppose, for instance, that rooms 1, 2, and 3 are connected to every other 
>>>> room (of 100,000). The other rooms are only connected to these three. When 
>>>> you randomly sample 8% of the rooms, the most likely result is you miss 
>>>> all 
>>>> these, think the average number of passages is 3, and estimate a result of 
>>>> 150,000 passages. In fact, the average is about 6, and the number of 
>>>> passages is near 300,000, and your estimate will be too low. What's worse, 
>>>> if you hit one of the high-degree rooms during your random sample, your 
>>>> 8000 rooms will have a total of about 124,000 passages and you will 
>>>> estimate the average is 15 and submit an estimate of 750,000, which is too 
>>>> high! The random sampling strategy is guaranteed to fail on this case!
>>>>
>>>> The sample doesn't make sense because the small number of rooms in it 
>>>> isn't enough to throw off the estimate the way the example I gave does. 
>>>> Hopefully that helps. Both strategies in the analysis are meant to help 
>>>> you 
>>>> correctly solve cases like these, while not messing up your average for 
>>>> normal cases.
>>>>
>>>> And, yes, I think this is rather in the dirty tricks department. They 
>>>> probably put it here in the qualification round, where there were three 
>>>> easy-to-medium problems and a fourth with an easy small case, so that 
>>>> nobody would feel they unfairly lost a round because of it. Either that, 
>>>> or 
>>>> it's a harbinger that all the rounds this year are going to have a problem 
>>>> with even more bizarre edge cases than usual that we are going to have to 
>>>> figure out.
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2022 at 10:46 AM Ricola <roelandt...@gmail.com> wrote:
>>>>
>>>>> In the analysis of Twisty Little Passages, In the "importance sampling 
>>>>> solution" it is said that we could give a weight of A/B of the sample got 
>>>>> with "W". But that sample value (degree) is B! So sample * weight gives B 
>>>>> * 
>>>>> A/B = A.
>>>>>
>>>>> 1. Why do we even need that sample as we don't even use its value, why 
>>>>> should we even query the degree of R2? We don't even need to walk.
>>>>> 2. This just ends up that instead of querying K rooms, we query K/2 of 
>>>>> them and count each degree twice. How is that even different than 
>>>>> computing 
>>>>> the average degree of K/2 rooms?
>>>>> If we look at the example given, 8/6 is the exact same result we get 
>>>>> if we take the average degrees of the rooms we got with the T queries(R1 
>>>>> : 
>>>>> 1, R2 : 1, R3 : 2 => avg 4/3)
>>>>>
>>>>> Either I misunderstood the explanation  or something is wrong there, 
>>>>> as implementing that solution gives wrong answer, which is normal as it's 
>>>>> the same than using the average degree of the samples. 
>>>>>
>>>>> -- 
>>>>> -- You received this message because you are subscribed to the Google 
>>>>> Groups Code Jam group. To post to this group, send email to 
>>>>> googl...@googlegroups.com. To unsubscribe from this group, send email 
>>>>> to google-code...@googlegroups.com. For more options, visit this 
>>>>> group at https://groups.google.com/d/forum/google-code?hl=en
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Google Code Jam" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to google-code...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/google-code/6d8b752e-17e7-4eda-8049-25a38dfe9b46n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/google-code/6d8b752e-17e7-4eda-8049-25a38dfe9b46n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>

-- 
-- You received this message because you are subscribed to the Google Groups 
Code Jam group. To post to this group, send email to 
google-code@googlegroups.com. To unsubscribe from this group, send email to 
google-code+unsubscr...@googlegroups.com. For more options, visit this group at 
https://groups.google.com/d/forum/google-code?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Google Code Jam" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-code+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-code/f74d62f9-95cd-43bb-9ec1-5e3891e1b973n%40googlegroups.com.

Re: [code jam] Qualifications 2022 Twisty Little Passages Analysis Importance Sampling

Reply via email to