rc + your patch (and a couple of our own custom ones)

On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin <[email protected]> wrote:

> Dan
>
> Btw are you running with my patch for this? Or still plain rc?
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
> > On 27 Feb 2017, at 22:46, Bolke de Bruin <[email protected]> wrote:
> >
> > I'll have a look. I verified and the code is there to take of this.
> >
> > B.
> >
> > Sent from my iPhone
> >
> >> On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]>
> wrote:
> >>
> >> Repro steps:
> >> - Create a DAG with a dummy task
> >> - Let this DAG run for one dagrun
> >> - Add a new subdag operator that contains a dummy operator to this DAG
> that
> >> has depends_on_past set to true
> >> - click on the white square for the new subdag operator in the DAGs
> first
> >> dagrun
> >> - Click "Zoom into subdag" (takes you to the graph view for the subdag)
> >> - Click the dummy task in the graph view and click "Mark Success"
> >> - Observe that the list of tasks to mark as success is empty (it should
> >> contain the dummy task)
> >>
> >>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]>
> wrote:
> >>>
> >>> Dan
> >>>
> >>> Can you elaborate on 2, cause I thought I specifically took care of
> that.
> >>>
> >>> Cheers
> >>> Bolke
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected].
> INVALID>
> >>> wrote:
> >>>>
> >>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track
> the
> >>>> pending issues.
> >>>>
> >>>> There are two more issues we found which I included there:
> >>>> 1. Task instances that have their state manually set to running make
> the
> >>> UI
> >>>> for their DAG unable to parse
> >>>> 2. Mark success doesn't work for non existent task instances/dagruns
> >>> which
> >>>> breaks the subdag use case (setting tasks as successful via the graph
> >>> view)
> >>>>
> >>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]>
> >>> wrote:
> >>>>>
> >>>>> Hey Max
> >>>>>
> >>>>> It is massive for sure. Sorry about that ;-). However it is not as
> >>> massive
> >>>>> as you might deduct from a first view. 0) run tasks concurrently
> across
> >>> dag
> >>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating
> of
> >>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on
> the
> >>>>> executor for status updates was replaced, 4) (tbd) executor failure
> >>> check
> >>>>> to protect against endless Ioops.
> >>>>>
> >>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2
> is a
> >>>>> subtle change, that touches a couple of lines to pop/push properly.
> 3)
> >>> is
> >>>>> bigger, as I didn't like the reliance on the executor. 4) is old code
> >>> that
> >>>>> needs to be added again.
> >>>>>
> >>>>> I probably can leave out 3 which makes 4 mood. The change would be
> >>>>> smaller. Maybe I could even completely remove 3 and just add 4. What
> are
> >>>>> your thoughts?
> >>>>>
> >>>>> The random failures we were seeing were the "implicit" test of not a
> >>>>> executing in the right order and then deadlocking. But no explicit
> tests
> >>>>> exist. Help would definitely be appreciated.
> >>>>>
> >>>>> Yes I thought about using the scheduler and/or reusing logic from the
> >>>>> scheduler. I even experimented a little with it but it didn't allow
> me
> >>> to
> >>>>> pass the tests effectively.
> >>>>>
> >>>>> What I am planning to do is split the function and make it unit
> testable
> >>>>> if you agree with the current approach.
> >>>>>
> >>>>> Bolke
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
> >>> [email protected]>
> >>>>> wrote:
> >>>>>>
> >>>>>> This PR is pretty massive and complex! It looks like solid work but
> >>> let's
> >>>>>> be really careful around testing and rolling this out.
> >>>>>>
> >>>>>> This may be out of scope for this PR, but wanted to discuss the
> idea of
> >>>>>> using the scheduler's logic to perform backfills. It'd be nice to
> have
> >>>>> that
> >>>>>> logic in one place, though I lost grasp on the details around
> >>> feasibility
> >>>>>> around this approach. I'm sure you looked into this option before
> >>> issuing
> >>>>>> this PR and I'm curious to hear your thoughts on blockers/challenges
> >>>>> around
> >>>>>> this alternate approach.
> >>>>>>
> >>>>>> Also I'm wondering whether we have any sort of mechanisms in our
> >>>>>> integration test to validate that task dependencies are respected
> and
> >>> run
> >>>>>> in the right order. If not I was thinking we could build some
> >>> abstraction
> >>>>>> to make it easy to write this type of tests in an expressive way.
> >>>>>>
> >>>>>> ```
> >>>>>> #[some code to run a backfill, or a scheduler session]
> >>>>>> it = IntegrationTestResults(dag_id='exmaple1')
> >>>>>> assert it.ran_before('task1', 'task_2')
> >>>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in
> >>>>> parallel
> >>>>>> assert ti.none_failed()
> >>>>>> assert ti.ran_last('root')
> >>>>>> assert ti.max_concurrency_reached() == POOL_LIMIT
> >>>>>> ```
> >>>>>>
> >>>>>> Max
> >>>>>>
> >>>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected]
> >
> >>>>> wrote:
> >>>>>>>
> >>>>>>> I have worked in the Backfill issue also in collaboration with
> >>> Jeremiah.
> >>>>>>>
> >>>>>>> The refactor to use dag runs in backfills caused a regression
> >>>>>>> in task execution performance as dag runs were executed
> >>>>>>> sequentially. Next to that, the backfills were non deterministic
> >>>>>>> due to the random execution of tasks, causing root tasks
> >>>>>>> being added to the non ready list too soon.
> >>>>>>>
> >>>>>>> This updates the backfill logic as follows:
> >>>>>>>
> >>>>>>>     • Parallelize execution of tasks
> >>>>>>>     • Use a leave first execution model; Breadth-first algorithm by
> >>>>>>> Jerermiah
> >>>>>>>     • Replace state updates from the executor by task based only
> >>>>>>> updates
> >>>>>>>
> >>>>>>> https://github.com/apache/incubator-airflow/pull/2107
> >>>>>>>
> >>>>>>> Please review and test properly.
> >>>>>>>
> >>>>>>> What has been left out at the moment is the checking the executor
> >>> itself
> >>>>>>> for multiple failures of a task run, where the task itself was
> never
> >>>>> able
> >>>>>>> to execute. Let me know if this is a real world scenario (maybe
> when
> >>>>> disk
> >>>>>>> space issue?). I will add it back in.
> >>>>>>>
> >>>>>>> - Bolke
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]>
> wrote:
> >>>>>>>>
> >>>>>>>> Hi Dan,
> >>>>>>>>
> >>>>>>>> - Backfill indeed runs only one dagrun at the time, see line 1755
> of
> >>>>>>> jobs.py. I’ll think about how to fix this over the weekend (I
> think it
> >>>>> was
> >>>>>>> my change that introduced this). Suggestions always welcome.
> Depending
> >>>>> the
> >>>>>>> impact it is a blocker or not. We don’t often use backfills and
> >>>>> definitely
> >>>>>>> not at your size, so that is why it didn’t pop up with us. I’m
> >>> assuming
> >>>>>>> blocker for now, btw.
> >>>>>>>> - Speculation on the High DB Load. I’m not sure what your
> benchmark
> >>> is
> >>>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the
> code
> >>>>>>> dependencies are checked a couple of times for one run and even
> task
> >>>>>>> instance. Dependency checking requires aggregation on the DB, which
> >>> is a
> >>>>>>> performance killer. Annoying but not a blocker.
> >>>>>>>> - Skipped tasks potentially cause a dagrun to be marked
> >>> failure/success
> >>>>>>> prematurely. BranchOperators are widely used if it affects these
> >>>>> operators,
> >>>>>>> then it is a blocker.
> >>>>>>>>
> >>>>>>>> - Bolke
> >>>>>>>>
> >>>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected].
> >>>>> INVALID>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Update on old pending issues:
> >>>>>>>>> - Black Squares in UI: Fix merged
> >>>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in
> >>> flight
> >>>>>>>>>
> >>>>>>>>> New Issues:
> >>>>>>>>> - Backfill seems to be having issues (only running one dagrun at
> a
> >>>>>>> time),
> >>>>>>>>> we are still investigating - might be a blocker
> >>>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating
> but
> >>>>> it's
> >>>>>>>>> probably not a blocker for the release
> >>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as
> >>>>>>> failure/success
> >>>>>>>>> prematurely - not sure whether or not to classify this as a
> blocker
> >>>>>>> (only
> >>>>>>>>> really an issue for users who use the BranchingPythonOperator,
> which
> >>>>>>> AirBnB
> >>>>>>>>> does)
> >>>>>>>>>
> >>>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <
> [email protected]
> >>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is not
> >>>>>>> enforced
> >>>>>>>>>> That said, our UI allows for the manual creation of DAG Runs
> >>> without
> >>>>> a
> >>>>>>>>>> start date as shown in the images below:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0
> >>>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0>
> >>>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0
> >>>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> >>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Our database may have edge cases that could be associated with
> >>>>> running
> >>>>>>>>>> any
> >>>>>>>>>>> previous version that may or may not have been part of an
> official
> >>>>>>>>>> release.
> >>>>>>>>>>>
> >>>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one
> >>>>>>> option is
> >>>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and
> >>>>> have a
> >>>>>>>>>>> future official minor apache release 1.8.1 that would fix these
> >>>>> minor
> >>>>>>>>>>> issues that are not deal breaker.
> >>>>>>>>>>>
> >>>>>>>>>>> @bolke, I'm curious, how long does it take you to go through
> one
> >>>>>>> release
> >>>>>>>>>>> cycle? Oh, and do you have a documented step by step process
> for
> >>>>>>>>>> releasing?
> >>>>>>>>>>> I'd like to add the Pypi part to this doc and add committers
> that
> >>>>> are
> >>>>>>>>>>> interested to have rights on the project on Pypi.
> >>>>>>>>>>>
> >>>>>>>>>>> Max
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <
> >>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should
> >>>>> always
> >>>>>>>>>> be
> >>>>>>>>>>>> set for a DagRun (create_dagrun) does so  I didn't check the
> code
> >>>>>>>>>> though.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <
> [email protected].
> >>>>>>>>>> INVALID>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a
> start
> >>>>> date,
> >>>>>>>>>>> not
> >>>>>>>>>>>> a
> >>>>>>>>>>>>> dag (which makes it even less likely to happen). I don't
> think
> >>>>> this
> >>>>>>>>>> is
> >>>>>>>>>>> a
> >>>>>>>>>>>>> blocker for releasing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> >>>>>>>>>> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to
> load
> >>>>> due
> >>>>>>>>>> to
> >>>>>>>>>>>>>> this commit:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger
> Dag
> >>>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This fixed it:
> >>>>>>>>>>>>>> -                            </a> <span id="statuses_info"
> >>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> >>>>>>>>>> title="Start
> >>>>>>>>>>>> Date:
> >>>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
> >>>>>>>>>>>>>> +                            </a> <span id="statuses_info"
> >>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign"
> >>> aria-hidden="true"></span>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates
> set,
> >>>>> so a
> >>>>>>>>>>>> broken
> >>>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make
> >>>>> this a
> >>>>>>>>>>>> blocker
> >>>>>>>>>>>>>> for the release or not, I'm guessing for most deployments
> this
> >>>>>>> would
> >>>>>>>>>>>> occur
> >>>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1
> (binding)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <
> >>>>>>> [email protected]
> >>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> IPMC Voting can be found here:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> >>>>>>>>>>>>>>> 201702.mbox/%
> >>>>>>>>>>>>>>>> [email protected]%3e <
> >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> >>>>>>>>>>>>>>> 201702.mbox/%
> >>>>>>>>>>>>>>>> [email protected]%3E>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>> Bolke
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <
> [email protected]
> >>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been
> >>>>>>>>>> accepted.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 9 “+1” votes received:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - Maxime Beauchemin (binding)
> >>>>>>>>>>>>>>>>> - Arthur Wiedmer (binding)
> >>>>>>>>>>>>>>>>> - Dan Davydov (binding)
> >>>>>>>>>>>>>>>>> - Jeremiah Lowin (binding)
> >>>>>>>>>>>>>>>>> - Siddharth Anand (binding)
> >>>>>>>>>>>>>>>>> - Alex van Boxel (binding)
> >>>>>>>>>>>>>>>>> - Bolke de Bruin (binding)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> >>>>>>>>>>>>>>>>> - Yi (non-binding)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Vote thread (start):
> >>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> >>>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache.
> >>>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%
> 3C7EB7B6D6-
> >>>>>>>>>>>>>>> 092E-48D2-AA0F-
> >>>>>>>>>>>>>>>> [email protected]%3E>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Next steps:
> >>>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC
> mailinglist. I
> >>> do
> >>>>>>>>>>> expect
> >>>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe
> a
> >>>>>>>>>> license
> >>>>>>>>>>>> here
> >>>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As
> >>> long
> >>>>> as
> >>>>>>>>>>>> these
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the
> vote.
> >>>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and
> >>>>> finalisation I
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>>> rebrand the RC to Release.
> >>>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then
> the
> >>>>> tar
> >>>>>>>>>>> ball
> >>>>>>>>>>>>>>>> needs to propagate to the mirrors.
> >>>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?)
> >>>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It
> seems
> >>>>> we
> >>>>>>>>>> can
> >>>>>>>>>>>>>>> keep
> >>>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well (
> >>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> >>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package
> >).
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jippie!
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Bolke
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
>

Reply via email to