Yeah. That discussion actually made me think that probably we need to explain it better :)
On Sun, Feb 6, 2022 at 11:10 PM Howard Yoo <[email protected]> wrote: > As we discuss this topic, the more and more I get to understand the > reasons behind all those philosophies behind, so I appreciate the knowledge > that I gained. > > As long as those terms and principles are well described and explained > without confusion, I believe we are moving to the right direction and > that’s what matters. > > - Howard > > Sent from my iPhone > > On Feb 6, 2022, at 3:24 PM, Jarek Potiuk <[email protected]> wrote: > > > IMHO It does not really matter if they are the same or not and which one > is the same. This is actually the beauty of the "abstract" and "vague" > logical_date. Those are different "concepts" that you use in different > cases. > > The logical date **might** be the same as one of the interval_dates. It's > just an "abstract" representation of the particular "run_id" - and you > should not care, because "logical_date" makes sense for some cases, but > "data_interval_start/end" for other cases. > > * If your task is about "data_interval" - by all means use the > data_interval_start and end. > * if your task is not about "interval" - use the "logical_date". > > That is how I see it at least. By using a different approach when you use > different cases the users might free their "mental-mapping" - they do not > have to map the "logical_date" to either "start" or "end". It does not > matter. but if they process a data interval, they have very clear > boundaries of ("start" <-> "end") range that they can use without even > thinking on. how "logical_date" maps to it. > > For me - those are completely different cases and they are orthogonal to > each other (even if some of those values are the same). > > J. > > On Sun, Feb 6, 2022 at 7:00 PM Howard Yoo <[email protected]> wrote: > >> I see, thank you for the info. >> I didn’t know about the existence of the data_interval_start and end >> dates. I briefly looked at those definitions, and was wondering… wouldn’t >> they be equal to the logical dates? I do see those variables mentioned in >> https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html, >> and also see the ds and ts meaning logical dates. In practice, are those >> dates and timestamps supposed to be the same? >> >> Wonder also, if the ‘data_’ prefix would be necessary if airfow would be >> used to orchestrate far more things in the future (perhaps this may be >> another thread), but in general, we should have a continuous discussions to >> further clearly define all those dates for the improved usage of airflow. >> >> Howard >> >> Sent from my iPhone >> >> On Feb 6, 2022, at 11:15 AM, Jarek Potiuk <[email protected]> wrote: >> >> >> We already have `data_interval_start` and `data_interval_end' as fields, >> and we need something else that can have more "abstract" meaning to apply >> to the whole run as "single thing". Using interval_date would be a bit >> ambiguous. >> >> "Did you mean start or end actually when you mentioned interval date?" - >> is the question that I anticipate happening a lot if we mix those. >> >> J. >> >> >> >> On Sun, Feb 6, 2022 at 6:04 PM Howard Yoo <[email protected]> wrote: >> >>> Now I can understand why the data_date may not be a perfect fit to >>> describe the term. >>> >>> This is not to be against the logical_date, but what about >>> ‘interval_date?’ We have the schedule interval, which defines the duration >>> of the interval (e.g. 1day), so wouldn’t interval start and end date be a >>> better representation of it rather than the logical date? >>> >>> Just want to hear whether that has been brought up already or not. >>> >>> Howard >>> >>> Sent from my iPhone >>> >>> On Feb 6, 2022, at 10:25 AM, Jarek Potiuk <[email protected]> wrote: >>> >>> >>> I wholeheartedly agree with TP on that one. I think while some time >>> ago "data date" could make sense, Airflow's future is much more than just >>> processing data intervals. >>> This is the primary use case and this is where Airflow shines od course, >>> but one of the good examples of how Airflow is used out there, and while we >>> are not really encouraging it, there are not only legitimate, but also >>> something that I hope Airflow will treat as first-time citizens soon (and >>> it kind of already is with custom timetables). >>> >>> Just an example here - for me one of the most eye-opening talks in last >>> year's Airflow Summit >>> https://airflowsummit.org/sessions/2021/provision-as-a-service/ >>> In this talk Cloudflare engineers explain how they manage the CloudFlare >>> infrastructure using Airflow. >>> >>> The "Data date" has no meaning in this case. But the "logical Date" >>> (which is the vaguest-possible one as TP explained) continues to have one. >>> This is the "logical date of the infrastructure provisioning". Thanks >>> to Airflow (as I understand it) Cloudflare is able to re-provision their >>> services to "yesterday's logical date infrastructure" today - for example. >>> >>> That would not fly with "data date". >>> >>> J, >>> >>>
