We have to use a lot of time sensors like this, for reports that shouldn't be filed to a third party before a certain time of day. Since these sensors are themselves tasks, they can fail to be scheduled or can fail, like if the underlying worker instance dies. I would recommend double checking your concurrency settings (esp. since you will have multiple days worth of DAGs concurrently running) and your retry settings.
On Tue, Jun 5, 2018 at 10:34 AM, Pedro Machado <pe...@205datalab.com> wrote: > Thanks, Max! > > On Mon, Jun 4, 2018 at 12:47 PM Maxime Beauchemin < > maximebeauche...@gmail.com> wrote: > > > The common standard is to have the execution_date aligned with the > > partition date in the database (say 2018-08-08) and contain data from > > 2018-08-08T00:00:000 > > to 2018-08-09T23:59:999. > > > > The partition date and execution_date match and correspond to the left > > bound of the time interval processed. > > > > Then you'd use some sensors to make sure this cannot run until the > desired > > time or conditions are met. > > > > Max > > > > On Mon, Jun 4, 2018 at 5:46 AM Pedro Machado <pe...@205datalab.com> > wrote: > > > > > Hi. What is the recommended way to deal with data latency? For > example, I > > > have a feed that is not considered final until 72 hours have passed > after > > > the end of the daily period. > > > > > > For example, Monday's data would be ready by Thursday at 23:59. > > > > > > Should I pull data based on the execution date minus a 72 hour offset > or > > > use the execution date and somehow delay the data pull for 72 hours? > > > > > > The latter would be more intuitive (data pull date = execution date) > but > > I > > > am not sure if it's a good pattern. > > > > > > Thanks, > > > > > > Pedro > > > > > >