On 22 March 2018 at 13:41, Neil Patel Quinn <[email protected]> wrote:
> > Both the edit data and pageview data that you're talking about come from > the Hadoop-based Analytics Data Lake > <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>. However, > because of limitations in the underlying MediaWiki application databases > <https://www.mediawiki.org/wiki/Manual:Database_layout> *that Hive pulls > edit data from*, the data requires some complex reconstruction and > denormalization > <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Data_Lake/Edits/Pipeline> > that takes several days to a week. > > Sorry, I garbled that a little. It's more correct to say: "because of limitations in the underlying MediaWiki application databases *that are the source of the edit data*, the data requires..."
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
