I'll have a look. I verified and the code is there to take of this. 

B. 

Sent from my iPhone

> On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]> wrote:
> 
> Repro steps:
> - Create a DAG with a dummy task
> - Let this DAG run for one dagrun
> - Add a new subdag operator that contains a dummy operator to this DAG that
> has depends_on_past set to true
> - click on the white square for the new subdag operator in the DAGs first
> dagrun
> - Click "Zoom into subdag" (takes you to the graph view for the subdag)
> - Click the dummy task in the graph view and click "Mark Success"
> - Observe that the list of tasks to mark as success is empty (it should
> contain the dummy task)
> 
>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> wrote:
>> 
>> Dan
>> 
>> Can you elaborate on 2, cause I thought I specifically took care of that.
>> 
>> Cheers
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]>
>> wrote:
>>> 
>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the
>>> pending issues.
>>> 
>>> There are two more issues we found which I included there:
>>> 1. Task instances that have their state manually set to running make the
>> UI
>>> for their DAG unable to parse
>>> 2. Mark success doesn't work for non existent task instances/dagruns
>> which
>>> breaks the subdag use case (setting tasks as successful via the graph
>> view)
>>> 
>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]>
>> wrote:
>>>> 
>>>> Hey Max
>>>> 
>>>> It is massive for sure. Sorry about that ;-). However it is not as
>> massive
>>>> as you might deduct from a first view. 0) run tasks concurrently across
>> dag
>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating of
>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on the
>>>> executor for status updates was replaced, 4) (tbd) executor failure
>> check
>>>> to protect against endless Ioops.
>>>> 
>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a
>>>> subtle change, that touches a couple of lines to pop/push properly. 3)
>> is
>>>> bigger, as I didn't like the reliance on the executor. 4) is old code
>> that
>>>> needs to be added again.
>>>> 
>>>> I probably can leave out 3 which makes 4 mood. The change would be
>>>> smaller. Maybe I could even completely remove 3 and just add 4. What are
>>>> your thoughts?
>>>> 
>>>> The random failures we were seeing were the "implicit" test of not a
>>>> executing in the right order and then deadlocking. But no explicit tests
>>>> exist. Help would definitely be appreciated.
>>>> 
>>>> Yes I thought about using the scheduler and/or reusing logic from the
>>>> scheduler. I even experimented a little with it but it didn't allow me
>> to
>>>> pass the tests effectively.
>>>> 
>>>> What I am planning to do is split the function and make it unit testable
>>>> if you agree with the current approach.
>>>> 
>>>> Bolke
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
>> [email protected]>
>>>> wrote:
>>>>> 
>>>>> This PR is pretty massive and complex! It looks like solid work but
>> let's
>>>>> be really careful around testing and rolling this out.
>>>>> 
>>>>> This may be out of scope for this PR, but wanted to discuss the idea of
>>>>> using the scheduler's logic to perform backfills. It'd be nice to have
>>>> that
>>>>> logic in one place, though I lost grasp on the details around
>> feasibility
>>>>> around this approach. I'm sure you looked into this option before
>> issuing
>>>>> this PR and I'm curious to hear your thoughts on blockers/challenges
>>>> around
>>>>> this alternate approach.
>>>>> 
>>>>> Also I'm wondering whether we have any sort of mechanisms in our
>>>>> integration test to validate that task dependencies are respected and
>> run
>>>>> in the right order. If not I was thinking we could build some
>> abstraction
>>>>> to make it easy to write this type of tests in an expressive way.
>>>>> 
>>>>> ```
>>>>> #[some code to run a backfill, or a scheduler session]
>>>>> it = IntegrationTestResults(dag_id='exmaple1')
>>>>> assert it.ran_before('task1', 'task_2')
>>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in
>>>> parallel
>>>>> assert ti.none_failed()
>>>>> assert ti.ran_last('root')
>>>>> assert ti.max_concurrency_reached() == POOL_LIMIT
>>>>> ```
>>>>> 
>>>>> Max
>>>>> 
>>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> I have worked in the Backfill issue also in collaboration with
>> Jeremiah.
>>>>>> 
>>>>>> The refactor to use dag runs in backfills caused a regression
>>>>>> in task execution performance as dag runs were executed
>>>>>> sequentially. Next to that, the backfills were non deterministic
>>>>>> due to the random execution of tasks, causing root tasks
>>>>>> being added to the non ready list too soon.
>>>>>> 
>>>>>> This updates the backfill logic as follows:
>>>>>> 
>>>>>>      • Parallelize execution of tasks
>>>>>>      • Use a leave first execution model; Breadth-first algorithm by
>>>>>> Jerermiah
>>>>>>      • Replace state updates from the executor by task based only
>>>>>> updates
>>>>>> 
>>>>>> https://github.com/apache/incubator-airflow/pull/2107
>>>>>> 
>>>>>> Please review and test properly.
>>>>>> 
>>>>>> What has been left out at the moment is the checking the executor
>> itself
>>>>>> for multiple failures of a task run, where the task itself was never
>>>> able
>>>>>> to execute. Let me know if this is a real world scenario (maybe when
>>>> disk
>>>>>> space issue?). I will add it back in.
>>>>>> 
>>>>>> - Bolke
>>>>>> 
>>>>>> 
>>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Dan,
>>>>>>> 
>>>>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 of
>>>>>> jobs.py. I’ll think about how to fix this over the weekend (I think it
>>>> was
>>>>>> my change that introduced this). Suggestions always welcome. Depending
>>>> the
>>>>>> impact it is a blocker or not. We don’t often use backfills and
>>>> definitely
>>>>>> not at your size, so that is why it didn’t pop up with us. I’m
>> assuming
>>>>>> blocker for now, btw.
>>>>>>> - Speculation on the High DB Load. I’m not sure what your benchmark
>> is
>>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the code
>>>>>> dependencies are checked a couple of times for one run and even task
>>>>>> instance. Dependency checking requires aggregation on the DB, which
>> is a
>>>>>> performance killer. Annoying but not a blocker.
>>>>>>> - Skipped tasks potentially cause a dagrun to be marked
>> failure/success
>>>>>> prematurely. BranchOperators are widely used if it affects these
>>>> operators,
>>>>>> then it is a blocker.
>>>>>>> 
>>>>>>> - Bolke
>>>>>>> 
>>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected].
>>>> INVALID>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Update on old pending issues:
>>>>>>>> - Black Squares in UI: Fix merged
>>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in
>> flight
>>>>>>>> 
>>>>>>>> New Issues:
>>>>>>>> - Backfill seems to be having issues (only running one dagrun at a
>>>>>> time),
>>>>>>>> we are still investigating - might be a blocker
>>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating but
>>>> it's
>>>>>>>> probably not a blocker for the release
>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as
>>>>>> failure/success
>>>>>>>> prematurely - not sure whether or not to classify this as a blocker
>>>>>> (only
>>>>>>>> really an issue for users who use the BranchingPythonOperator, which
>>>>>> AirBnB
>>>>>>>> does)
>>>>>>>> 
>>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <[email protected]
>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is not
>>>>>> enforced
>>>>>>>>> That said, our UI allows for the manual creation of DAG Runs
>> without
>>>> a
>>>>>>>>> start date as shown in the images below:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0
>>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0>
>>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0
>>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> 
>>>>>>>>>> Our database may have edge cases that could be associated with
>>>> running
>>>>>>>>> any
>>>>>>>>>> previous version that may or may not have been part of an official
>>>>>>>>> release.
>>>>>>>>>> 
>>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one
>>>>>> option is
>>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and
>>>> have a
>>>>>>>>>> future official minor apache release 1.8.1 that would fix these
>>>> minor
>>>>>>>>>> issues that are not deal breaker.
>>>>>>>>>> 
>>>>>>>>>> @bolke, I'm curious, how long does it take you to go through one
>>>>>> release
>>>>>>>>>> cycle? Oh, and do you have a documented step by step process for
>>>>>>>>> releasing?
>>>>>>>>>> I'd like to add the Pypi part to this doc and add committers that
>>>> are
>>>>>>>>>> interested to have rights on the project on Pypi.
>>>>>>>>>> 
>>>>>>>>>> Max
>>>>>>>>>> 
>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <
>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should
>>>> always
>>>>>>>>> be
>>>>>>>>>>> set for a DagRun (create_dagrun) does so  I didn't check the code
>>>>>>>>> though.
>>>>>>>>>>> 
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>> 
>>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected].
>>>>>>>>> INVALID>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a start
>>>> date,
>>>>>>>>>> not
>>>>>>>>>>> a
>>>>>>>>>>>> dag (which makes it even less likely to happen). I don't think
>>>> this
>>>>>>>>> is
>>>>>>>>>> a
>>>>>>>>>>>> blocker for releasing.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to load
>>>> due
>>>>>>>>> to
>>>>>>>>>>>>> this commit:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
>>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This fixed it:
>>>>>>>>>>>>> -                            </a> <span id="statuses_info"
>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
>>>>>>>>> title="Start
>>>>>>>>>>> Date:
>>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
>>>>>>>>>>>>> +                            </a> <span id="statuses_info"
>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign"
>> aria-hidden="true"></span>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates set,
>>>> so a
>>>>>>>>>>> broken
>>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make
>>>> this a
>>>>>>>>>>> blocker
>>>>>>>>>>>>> for the release or not, I'm guessing for most deployments this
>>>>>> would
>>>>>>>>>>> occur
>>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <
>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> IPMC Voting can be found here:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
>>>>>>>>>>>>>> 201702.mbox/%
>>>>>>>>>>>>>>> [email protected]%3e <
>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
>>>>>>>>>>>>>> 201702.mbox/%
>>>>>>>>>>>>>>> [email protected]%3E>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>> Bolke
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected]
>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been
>>>>>>>>> accepted.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 9 “+1” votes received:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Maxime Beauchemin (binding)
>>>>>>>>>>>>>>>> - Arthur Wiedmer (binding)
>>>>>>>>>>>>>>>> - Dan Davydov (binding)
>>>>>>>>>>>>>>>> - Jeremiah Lowin (binding)
>>>>>>>>>>>>>>>> - Siddharth Anand (binding)
>>>>>>>>>>>>>>>> - Alex van Boxel (binding)
>>>>>>>>>>>>>>>> - Bolke de Bruin (binding)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
>>>>>>>>>>>>>>>> - Yi (non-binding)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Vote thread (start):
>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
>>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache.
>>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
>>>>>>>>>>>>>> 092E-48D2-AA0F-
>>>>>>>>>>>>>>> [email protected]%3E>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Next steps:
>>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I
>> do
>>>>>>>>>> expect
>>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe a
>>>>>>>>> license
>>>>>>>>>>> here
>>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As
>> long
>>>> as
>>>>>>>>>>> these
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the vote.
>>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and
>>>> finalisation I
>>>>>>>>>> will
>>>>>>>>>>>>>>> rebrand the RC to Release.
>>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then the
>>>> tar
>>>>>>>>>> ball
>>>>>>>>>>>>>>> needs to propagate to the mirrors.
>>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?)
>>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems
>>>> we
>>>>>>>>> can
>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well (
>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jippie!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Bolke
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> 

Reply via email to