Repro steps:
- Create a DAG with a dummy task
- Let this DAG run for one dagrun
- Add a new subdag operator that contains a dummy operator to this DAG that
has depends_on_past set to true
- click on the white square for the new subdag operator in the DAGs first
dagrun
- Click "Zoom into subdag" (takes you to the graph view for the subdag)
- Click the dummy task in the graph view and click "Mark Success"
- Observe that the list of tasks to mark as success is empty (it should
contain the dummy task)

On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> wrote:

> Dan
>
> Can you elaborate on 2, cause I thought I specifically took care of that.
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
> > On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]>
> wrote:
> >
> > I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the
> > pending issues.
> >
> > There are two more issues we found which I included there:
> > 1. Task instances that have their state manually set to running make the
> UI
> > for their DAG unable to parse
> > 2. Mark success doesn't work for non existent task instances/dagruns
> which
> > breaks the subdag use case (setting tasks as successful via the graph
> view)
> >
> >> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]>
> wrote:
> >>
> >> Hey Max
> >>
> >> It is massive for sure. Sorry about that ;-). However it is not as
> massive
> >> as you might deduct from a first view. 0) run tasks concurrently across
> dag
> >> runs 1) ordering of the tasks was added to the loop. 2) calculating of
> >> deadlocks, running tasks, tasks to run was corrected, 3) relying on the
> >> executor for status updates was replaced, 4) (tbd) executor failure
> check
> >> to protect against endless Ioops.
> >>
> >> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a
> >> subtle change, that touches a couple of lines to pop/push properly. 3)
> is
> >> bigger, as I didn't like the reliance on the executor. 4) is old code
> that
> >> needs to be added again.
> >>
> >> I probably can leave out 3 which makes 4 mood. The change would be
> >> smaller. Maybe I could even completely remove 3 and just add 4. What are
> >> your thoughts?
> >>
> >> The random failures we were seeing were the "implicit" test of not a
> >> executing in the right order and then deadlocking. But no explicit tests
> >> exist. Help would definitely be appreciated.
> >>
> >> Yes I thought about using the scheduler and/or reusing logic from the
> >> scheduler. I even experimented a little with it but it didn't allow me
> to
> >> pass the tests effectively.
> >>
> >> What I am planning to do is split the function and make it unit testable
> >> if you agree with the current approach.
> >>
> >> Bolke
> >>
> >> Sent from my iPhone
> >>
> >>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
> [email protected]>
> >> wrote:
> >>>
> >>> This PR is pretty massive and complex! It looks like solid work but
> let's
> >>> be really careful around testing and rolling this out.
> >>>
> >>> This may be out of scope for this PR, but wanted to discuss the idea of
> >>> using the scheduler's logic to perform backfills. It'd be nice to have
> >> that
> >>> logic in one place, though I lost grasp on the details around
> feasibility
> >>> around this approach. I'm sure you looked into this option before
> issuing
> >>> this PR and I'm curious to hear your thoughts on blockers/challenges
> >> around
> >>> this alternate approach.
> >>>
> >>> Also I'm wondering whether we have any sort of mechanisms in our
> >>> integration test to validate that task dependencies are respected and
> run
> >>> in the right order. If not I was thinking we could build some
> abstraction
> >>> to make it easy to write this type of tests in an expressive way.
> >>>
> >>> ```
> >>> #[some code to run a backfill, or a scheduler session]
> >>> it = IntegrationTestResults(dag_id='exmaple1')
> >>> assert it.ran_before('task1', 'task_2')
> >>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in
> >> parallel
> >>> assert ti.none_failed()
> >>> assert ti.ran_last('root')
> >>> assert ti.max_concurrency_reached() == POOL_LIMIT
> >>> ```
> >>>
> >>> Max
> >>>
> >>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected]>
> >> wrote:
> >>>>
> >>>> I have worked in the Backfill issue also in collaboration with
> Jeremiah.
> >>>>
> >>>> The refactor to use dag runs in backfills caused a regression
> >>>> in task execution performance as dag runs were executed
> >>>> sequentially. Next to that, the backfills were non deterministic
> >>>> due to the random execution of tasks, causing root tasks
> >>>> being added to the non ready list too soon.
> >>>>
> >>>> This updates the backfill logic as follows:
> >>>>
> >>>>       • Parallelize execution of tasks
> >>>>       • Use a leave first execution model; Breadth-first algorithm by
> >>>> Jerermiah
> >>>>       • Replace state updates from the executor by task based only
> >>>> updates
> >>>>
> >>>> https://github.com/apache/incubator-airflow/pull/2107
> >>>>
> >>>> Please review and test properly.
> >>>>
> >>>> What has been left out at the moment is the checking the executor
> itself
> >>>> for multiple failures of a task run, where the task itself was never
> >> able
> >>>> to execute. Let me know if this is a real world scenario (maybe when
> >> disk
> >>>> space issue?). I will add it back in.
> >>>>
> >>>> - Bolke
> >>>>
> >>>>
> >>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> wrote:
> >>>>>
> >>>>> Hi Dan,
> >>>>>
> >>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 of
> >>>> jobs.py. I’ll think about how to fix this over the weekend (I think it
> >> was
> >>>> my change that introduced this). Suggestions always welcome. Depending
> >> the
> >>>> impact it is a blocker or not. We don’t often use backfills and
> >> definitely
> >>>> not at your size, so that is why it didn’t pop up with us. I’m
> assuming
> >>>> blocker for now, btw.
> >>>>> - Speculation on the High DB Load. I’m not sure what your benchmark
> is
> >>>> here (1.7.1 + multi processor dags?), but as you mentioned in the code
> >>>> dependencies are checked a couple of times for one run and even task
> >>>> instance. Dependency checking requires aggregation on the DB, which
> is a
> >>>> performance killer. Annoying but not a blocker.
> >>>>> - Skipped tasks potentially cause a dagrun to be marked
> failure/success
> >>>> prematurely. BranchOperators are widely used if it affects these
> >> operators,
> >>>> then it is a blocker.
> >>>>>
> >>>>> - Bolke
> >>>>>
> >>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected].
> >> INVALID>
> >>>> wrote:
> >>>>>>
> >>>>>> Update on old pending issues:
> >>>>>> - Black Squares in UI: Fix merged
> >>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in
> flight
> >>>>>>
> >>>>>> New Issues:
> >>>>>> - Backfill seems to be having issues (only running one dagrun at a
> >>>> time),
> >>>>>> we are still investigating - might be a blocker
> >>>>>> - High DB Load (~8x more than 1.7) - We are still investigating but
> >> it's
> >>>>>> probably not a blocker for the release
> >>>>>> - Skipped tasks potentially cause a dagrun to be marked as
> >>>> failure/success
> >>>>>> prematurely - not sure whether or not to classify this as a blocker
> >>>> (only
> >>>>>> really an issue for users who use the BranchingPythonOperator, which
> >>>> AirBnB
> >>>>>> does)
> >>>>>>
> >>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <[email protected]
> >
> >>>> wrote:
> >>>>>>
> >>>>>>> IMHO, a DAG run without a start date is non-sensical but is not
> >>>> enforced
> >>>>>>> That said, our UI allows for the manual creation of DAG Runs
> without
> >> a
> >>>>>>> start date as shown in the images below:
> >>>>>>>
> >>>>>>>
> >>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>>>> 202017-02-22%2016.00.40.png?dl=0
> >>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>>>> 202017-02-22%2016.00.40.png?dl=0>
> >>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>>>> 202017-02-22%2016.02.22.png?dl=0
> >>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>>>> 202017-02-22%2016.02.22.png?dl=0>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> Our database may have edge cases that could be associated with
> >> running
> >>>>>>> any
> >>>>>>>> previous version that may or may not have been part of an official
> >>>>>>> release.
> >>>>>>>>
> >>>>>>>> Let's see if anyone else reports the issue. If no one does, one
> >>>> option is
> >>>>>>>> to release 1.8.0 as is with a comment in the release notes, and
> >> have a
> >>>>>>>> future official minor apache release 1.8.1 that would fix these
> >> minor
> >>>>>>>> issues that are not deal breaker.
> >>>>>>>>
> >>>>>>>> @bolke, I'm curious, how long does it take you to go through one
> >>>> release
> >>>>>>>> cycle? Oh, and do you have a documented step by step process for
> >>>>>>> releasing?
> >>>>>>>> I'd like to add the Pypi part to this doc and add committers that
> >> are
> >>>>>>>> interested to have rights on the project on Pypi.
> >>>>>>>>
> >>>>>>>> Max
> >>>>>>>>
> >>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <
> [email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> So it is a database integrity issue? Afaik a start_date should
> >> always
> >>>>>>> be
> >>>>>>>>> set for a DagRun (create_dagrun) does so  I didn't check the code
> >>>>>>> though.
> >>>>>>>>>
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>>
> >>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected].
> >>>>>>> INVALID>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Should clarify this occurs when a dagrun does not have a start
> >> date,
> >>>>>>>> not
> >>>>>>>>> a
> >>>>>>>>>> dag (which makes it even less likely to happen). I don't think
> >> this
> >>>>>>> is
> >>>>>>>> a
> >>>>>>>>>> blocker for releasing.
> >>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> >>>>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I rolled this out in our prod and the webservers failed to load
> >> due
> >>>>>>> to
> >>>>>>>>>>> this commit:
> >>>>>>>>>>>
> >>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> >>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> >>>>>>>>>>>
> >>>>>>>>>>> This fixed it:
> >>>>>>>>>>> -                            </a> <span id="statuses_info"
> >>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> >>>>>>> title="Start
> >>>>>>>>> Date:
> >>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
> >>>>>>>>>>> +                            </a> <span id="statuses_info"
> >>>>>>>>>>> class="glyphicon glyphicon-info-sign"
> aria-hidden="true"></span>
> >>>>>>>>>>>
> >>>>>>>>>>> This is caused by assuming that all DAGs have start dates set,
> >> so a
> >>>>>>>>> broken
> >>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make
> >> this a
> >>>>>>>>> blocker
> >>>>>>>>>>> for the release or not, I'm guessing for most deployments this
> >>>> would
> >>>>>>>>> occur
> >>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> >>>>>>>> [email protected]
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding)
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <
> >>>> [email protected]
> >>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> IPMC Voting can be found here:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> >>>>>>>>>>>> 201702.mbox/%
> >>>>>>>>>>>>> [email protected]%3e <
> >>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> >>>>>>>>>>>> 201702.mbox/%
> >>>>>>>>>>>>> [email protected]%3E>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>> Bolke
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected]
> >
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been
> >>>>>>> accepted.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 9 “+1” votes received:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Maxime Beauchemin (binding)
> >>>>>>>>>>>>>> - Arthur Wiedmer (binding)
> >>>>>>>>>>>>>> - Dan Davydov (binding)
> >>>>>>>>>>>>>> - Jeremiah Lowin (binding)
> >>>>>>>>>>>>>> - Siddharth Anand (binding)
> >>>>>>>>>>>>>> - Alex van Boxel (binding)
> >>>>>>>>>>>>>> - Bolke de Bruin (binding)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> >>>>>>>>>>>>>> - Yi (non-binding)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Vote thread (start):
> >>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> >>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache.
> >>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
> >>>>>>>>>>>> 092E-48D2-AA0F-
> >>>>>>>>>>>>> [email protected]%3E>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Next steps:
> >>>>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I
> do
> >>>>>>>> expect
> >>>>>>>>>>>>> some changes to be required mostly in documentation maybe a
> >>>>>>> license
> >>>>>>>>> here
> >>>>>>>>>>>>> and there. So, we might end up with changes to stable. As
> long
> >> as
> >>>>>>>>> these
> >>>>>>>>>>>> are
> >>>>>>>>>>>>> not (significant) code changes I will not re-raise the vote.
> >>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and
> >> finalisation I
> >>>>>>>> will
> >>>>>>>>>>>>> rebrand the RC to Release.
> >>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then the
> >> tar
> >>>>>>>> ball
> >>>>>>>>>>>>> needs to propagate to the mirrors.
> >>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?)
> >>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems
> >> we
> >>>>>>> can
> >>>>>>>>>>>> keep
> >>>>>>>>>>>>> the apache branding as lib cloud is doing this as well (
> >>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> >>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jippie!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Bolke
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
>

Reply via email to