We should address automatic duplicate cleaning very soon, as Christian warned a while ago. He manually cleaned up duplicates a few times but we know it's a problem that needs solving.
On Mon, Feb 23, 2015 at 6:22 AM, Christian Aistleitner < [email protected]> wrote: > Hi Oliver, > > On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote: > > And, an additional point; I don't understand why, if dupes is the > > problem, the Hive query was not hit as badly by this as the equivalent > > UDF. > > just shooting in the dark, since you did not provide your query, but > if you by accident had been querying the > > wmf_raw.webrequest > > (database name ending in “_raw”) table instead of > > wmf.webrequest > > (no “_raw” in the database name), the difference you described would > be plausible (and given the patching of GHOST, they'd even be > expected). > > > Have fun, > Christian > > > > -- > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: [email protected] > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
