Nope, just dope, not /a/ dope!

On 23 February 2015 at 10:14, Dan Andreescu <[email protected]> wrote:
> Sorry - I'm a dope :)
>
> On Mon, Feb 23, 2015 at 9:35 AM, Andrew Otto <[email protected]> wrote:
>>
>> We should address automatic duplicate cleaning very soon, as Christian
>> warned a while ago.  He manually cleaned up duplicates a few times but we
>> know it's a problem that needs solving.
>>
>> Duplicates are already cleaned up, in the refined table.  There should
>> never be any duplicates in the wmf.webrequest table.
>>
>> https://gerrit.wikimedia.org/r/#/c/177522/
>>
>> Seeing as this was merged on Jan 26, it is possible that it was not
>> deployed when on Jan 27 when Oliver is noticing duplicates.
>>
>> We should be calculating a per-host arithmetic series over the sequence
>> numbers
>> when data is loaded.
>>
>> Please see the wmf_raw.webrequest_sequence_stats tables, for hourly
>> partition statistics, including duplicates and losses.
>>
>> -Ao
>>
>>
>>
>>
>> On Feb 23, 2015, at 09:01, Dan Andreescu <[email protected]> wrote:
>>
>> We should address automatic duplicate cleaning very soon, as Christian
>> warned a while ago.  He manually cleaned up duplicates a few times but we
>> know it's a problem that needs solving.
>>
>> On Mon, Feb 23, 2015 at 6:22 AM, Christian Aistleitner
>> <[email protected]> wrote:
>>>
>>> Hi Oliver,
>>>
>>> On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote:
>>> > And, an additional point; I don't understand why, if dupes is the
>>> > problem, the Hive query was not hit as badly by this as the equivalent
>>> > UDF.
>>>
>>> just shooting in the dark, since you did not provide your query, but
>>> if you by accident had been querying the
>>>
>>>   wmf_raw.webrequest
>>>
>>> (database name ending in “_raw”) table instead of
>>>
>>>   wmf.webrequest
>>>
>>> (no “_raw” in the database name), the difference you described would
>>> be plausible (and given the patching of GHOST, they'd even be
>>> expected).
>>>
>>>
>>> Have fun,
>>> Christian
>>>
>>>
>>>
>>> --
>>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>>>                            Companies' registry: 360296y in Linz
>>> Christian Aistleitner
>>> Kefermarkterstrasze 6a/3     Email:  [email protected]
>>> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>>>                              Fax:            +43 7946 / 20 5 81
>>>                              Homepage: http://quelltextlich.at/
>>> ---------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to