Hi all! I haven't heard any objections, so I will be stopping the jobs that generate these datasets today. I won't delete the existing data until further notice
On Mon, Dec 14, 2015 at 1:34 PM, Andrew Otto <[email protected]> wrote: > If we don’t hear any objections by Dec 30th, we will move forward with the > plan to no longer generate this data. > > > On Dec 11, 2015, at 12:40, Andrew Otto <[email protected]> wrote: > > Hi all, > > Soon, we will be merging the mobile web cache requests with the text cache > requests. text caches will now serve requests for mobile web[1]. > > This means that the webrequest_source=‘mobile’ partition in the webrequest > table in Hive will soon be empty, and all data that was previously in it > will be found in the webrequest_source=‘text’ partition. > > There are only 3 datasets that currently only use the > webrequest_source=‘mobile’ partition: > > - /a/log/webrequest/archive/mobile > - /a/log/webrequest/archive/5xx-mobile > - /a/log/webrequest/archive/zero > > (These are paths on stat1002, but they also exist in HDFS.) > > These datasets originally came from udp2log, but since early last year > they have been generated from Hadoop. With the upcoming cache merge, these > jobs will have to parse through all text requests, which will make Hadoop > busier. > > Do we know if these are being used? Would anyone be upset if we no longer > generated these datasets? > > Thanks! > -Andrew > > [1] https://phabricator.wikimedia.org/T109286 > > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
