Nope, just dope, not /a/ dope! On 23 February 2015 at 10:14, Dan Andreescu <[email protected]> wrote: > Sorry - I'm a dope :) > > On Mon, Feb 23, 2015 at 9:35 AM, Andrew Otto <[email protected]> wrote: >> >> We should address automatic duplicate cleaning very soon, as Christian >> warned a while ago. He manually cleaned up duplicates a few times but we >> know it's a problem that needs solving. >> >> Duplicates are already cleaned up, in the refined table. There should >> never be any duplicates in the wmf.webrequest table. >> >> https://gerrit.wikimedia.org/r/#/c/177522/ >> >> Seeing as this was merged on Jan 26, it is possible that it was not >> deployed when on Jan 27 when Oliver is noticing duplicates. >> >> We should be calculating a per-host arithmetic series over the sequence >> numbers >> when data is loaded. >> >> Please see the wmf_raw.webrequest_sequence_stats tables, for hourly >> partition statistics, including duplicates and losses. >> >> -Ao >> >> >> >> >> On Feb 23, 2015, at 09:01, Dan Andreescu <[email protected]> wrote: >> >> We should address automatic duplicate cleaning very soon, as Christian >> warned a while ago. He manually cleaned up duplicates a few times but we >> know it's a problem that needs solving. >> >> On Mon, Feb 23, 2015 at 6:22 AM, Christian Aistleitner >> <[email protected]> wrote: >>> >>> Hi Oliver, >>> >>> On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote: >>> > And, an additional point; I don't understand why, if dupes is the >>> > problem, the Hive query was not hit as badly by this as the equivalent >>> > UDF. >>> >>> just shooting in the dark, since you did not provide your query, but >>> if you by accident had been querying the >>> >>> wmf_raw.webrequest >>> >>> (database name ending in “_raw”) table instead of >>> >>> wmf.webrequest >>> >>> (no “_raw” in the database name), the difference you described would >>> be plausible (and given the patching of GHOST, they'd even be >>> expected). >>> >>> >>> Have fun, >>> Christian >>> >>> >>> >>> -- >>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- >>> Companies' registry: 360296y in Linz >>> Christian Aistleitner >>> Kefermarkterstrasze 6a/3 Email: [email protected] >>> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >>> Fax: +43 7946 / 20 5 81 >>> Homepage: http://quelltextlich.at/ >>> --------------------------------------------------------------- >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
