We should address automatic duplicate cleaning very soon, as Christian
warned a while ago.  He manually cleaned up duplicates a few times but we
know it's a problem that needs solving.

On Mon, Feb 23, 2015 at 6:22 AM, Christian Aistleitner <
[email protected]> wrote:

> Hi Oliver,
>
> On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote:
> > And, an additional point; I don't understand why, if dupes is the
> > problem, the Hive query was not hit as badly by this as the equivalent
> > UDF.
>
> just shooting in the dark, since you did not provide your query, but
> if you by accident had been querying the
>
>   wmf_raw.webrequest
>
> (database name ending in “_raw”) table instead of
>
>   wmf.webrequest
>
> (no “_raw” in the database name), the difference you described would
> be plausible (and given the patching of GHOST, they'd even be
> expected).
>
>
> Have fun,
> Christian
>
>
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                            Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  [email protected]
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                              Fax:            +43 7946 / 20 5 81
>                              Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to