Re: [isabelle-dev] Jenkins SPAM reduction

2016-07-16 Thread Lars Hupel
Glad you brought up this topic (although it isn't "spam" for any
meaningful definition of "spam"). I wanted to do it anyway, because
somebody asked me about it privately.

> Jenkins sends lots of mails, and I find it hard to pay attention to them
> at all. Even when I do look, the long message body makes it hard to find
> the key points: What failed? Why did it fail?

The message body consists of the entirety of something similar to
"isabelle build -v". As such, it is not excessively long. As usual, a
summary is printed at the very end, consisting of the sections "timing"
and "failed sessions".

>   * Tests are run less often, e.g. 2-3h after a push and including all
> later pushes in that time interval. This reduces test runs and increases
> chances that Isabelle + AFP correspond correctly when Jenkins makes a
> snapshot.

That would hardly be "continuous integration", more like "delayed
integration". As a first measure I have increased the grace period for
the "isabelle-repo" job from 5 seconds to 5 minutes. This should give
ample time to push already existing changesets to both repositories.

Increasing the grace period even more does not make sense, for two reasons:
1) The vast majority of build failures were because of large-scale
refactorings which cannot be done in 2 hours.
2) It contradicts your previous mail where you wanted to know exactly
what push broke the build.

The missing tooling here is for continuously testing accumulated changes
to both the repository and the AFP in these situations, without
affecting the global build. Git people use branches for that. In
Mercurial, the replacement is unclear.

>   * Mails are sent only once per day, as a summary of broken sessions at
> the end of the day, not every intermediate state.

This is too stateful. A compromise I can offer is to send a mail
whenever the build breaks, but not when it remains broken.

For the records, attached is a list of triggers supported by the mail
plugin.

> If Jenkins were more like a queue management system, it could probably
> also provide immediate feedback to the person who pushed something
> broken to it.

Jenkins can already do that. It doesn't work for our repositories though
because there are usually no mail addresses associated with changesets.

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] \nexists

2016-07-16 Thread Makarius
On 16/07/16 16:37, Lars Hupel wrote:
>> I am open to hear arguments why latex documents need to be continuously
>> available. So far there were none.
> 
> That statement is unrelated to what you wrote afterwards, but let me
> give you an argument anyway: The generated PDFs are, as you probably
> agree, formal documents. In that sense, they need to be checked
> continuously.

I invented that notion of formal documents myself, but I don't see the
reasoning why the latex runs need to be continuously checked. Nothing
depends on the resulting PDFs. When there is something wrong with it, it
is usually just a matter of untyped/unscoped latex macros and not
formally relevant.

So Latex could be run once a week or once a month, and very little will
happen.


In contrast, a broken proof, definition, or ML programming interface
causes a complete halt of the subsequent sessions building on that.

Broken formal content is particularly annoying, because nothing can be
reasonably "pushed on top of it". It would make the formal holes bigger
and bigger.


> But the situation is completely clear. Jenkins gives you an overview of
> what exactly broke, by means of the "changes" page:
> 
>   
> 
> It even provides handy links to the changesets.

I still have problems reading these Jenkins web pages: they seem to
contain 90% irrelevant information.


> To give useful performance metrics, here are
> some questions off the top of my head:
> 
> - Measure the whole build time or just sessions?

Separate session build times is important, but overall times also useful
to know.


> - Which sessions? What should be the minimum threshold for session
> runtime to be considered useful?

There could be indeed a threshold, e.g. 10-20s minimum session build
time to show it at all.


> - Elapsed time or CPU time?

Both is important.

Also the detailed parallel runtime parameters that are emitted every
0.5s during a session running: number of active threads, pending futures
etc.


> - What about sessions that grow in size over time?

That is indeed important, although we have just ignored it historically.

The AFP/Collections sessions are pretty close to the inevitable concrete
wall. It would be better to see in some nice graph, how the approach
towards hard limits progresses, and how attempts to postpone the hit
come out as changed performance figures.


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


[isabelle-dev] Jenkins SPAM reduction

2016-07-16 Thread Makarius
Jenkins sends lots of mails, and I find it hard to pay attention to them
at all. Even when I do look, the long message body makes it hard to find
the key points: What failed? Why did it fail?

Moreover, Isabelle + AFP get naturally out of sync, because pushes on
AFP are often done later, but Jenkins assumes them to be very close.


How about this:

  * Tests are run less often, e.g. 2-3h after a push and including all
later pushes in that time interval. This reduces test runs and increases
chances that Isabelle + AFP correspond correctly when Jenkins makes a
snapshot.

  * Mails are sent only once per day, as a summary of broken sessions at
the end of the day, not every intermediate state.


If Jenkins were more like a queue management system, it could probably
also provide immediate feedback to the person who pushed something
broken to it.


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] \nexists

2016-07-16 Thread Lars Hupel
> I am open to hear arguments why latex documents need to be continuously
> available. So far there were none.

That statement is unrelated to what you wrote afterwards, but let me
give you an argument anyway: The generated PDFs are, as you probably
agree, formal documents. In that sense, they need to be checked
continuously.

> Instead we have in Isabelle/76492eaf3dc1 + AFP/89fba56726d8 a typical
> situation where formal checking no longer works, due to changes in main
> Isabelle that are not immediately clear.

But the situation is completely clear. Jenkins gives you an overview of
what exactly broke, by means of the "changes" page:

  

It even provides handy links to the changesets.

> In the last 10 years, continuous testing of Isabelle + AFP had the
> following main purposes:
> 
>   * Make sure that formal checking of Isabelle always works 100% (main
> repository).

Still satisfied.

>   * Help to figure out quickly why a broken AFP stopped working. When
> did it work last time with Isabelle?

Still satisfied.

>   * Provide continuous performance measurements to keep everything going
> in the longer run: i.e. to avoid the "invisible concrete wall" that is
> always ahead of us, and moved occasionally further ahead.

The numbers are there, but we need some good way to evaluate them. I
have pointed that out multiple times.

Let me be very clear: I have invested a lot of time in a robust,
reliable, reproducible testing infrastructure. Almost everything now
uses standardized tools instead of ad-hoc Bash or Perl scripts and
consumes or produces structured data. But I cannot single-handedly
implement all the features. To give useful performance metrics, here are
some questions off the top of my head:

- Measure the whole build time or just sessions?
- Which sessions? What should be the minimum threshold for session
runtime to be considered useful?
- Elapsed time or CPU time?
- What about sessions that grow in size over time?

Cheers
Lars
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] \nexists

2016-07-16 Thread Makarius
On 16/07/16 14:26, Tobias Nipkow wrote:
> But it turns out you are
> right: combining -a and -o document=pdf is not a good idea, it breaks
> Logics_ZF.

In which way does it fail?

I am running "isabelle build -a -o document=pdf" frequently. It should
definitely work.

It might be just an example of latex failure due to some odd latex
installation.


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] \nexists

2016-07-16 Thread Makarius
On 16/07/16 14:26, Tobias Nipkow wrote:
>>
>> In contrast, a full "build -a" for the formal content is vital. The
>> shorter that takes, the more it becomes second nature to do it
>> frequently -- I often do it for every single changeset (before the local
>> commit).
> 
> Your notion of what is good for developers differs from their own
> notion, at least for the ones I talked to. But it turns out you are
> right: combining -a and -o document=pdf is not a good idea, it breaks
> Logics_ZF.

I am open to hear arguments why latex documents need to be continuously
available. So far there were none.

Instead we have in Isabelle/76492eaf3dc1 + AFP/89fba56726d8 a typical
situation where formal checking no longer works, due to changes in main
Isabelle that are not immediately clear.


In the last 10 years, continuous testing of Isabelle + AFP had the
following main purposes:

  * Make sure that formal checking of Isabelle always works 100% (main
repository).

  * Help to figure out quickly why a broken AFP stopped working. When
did it work last time with Isabelle?

  * Provide continuous performance measurements to keep everything going
in the longer run: i.e. to avoid the "invisible concrete wall" that is
always ahead of us, and moved occasionally further ahead.


Only little of that has been provided by the testing infrastructure in
recent years. The new Jenkins setup and the new hardware has improved
the sitation a bit, but we are still below the situation from some years
ago.

This is why I am doing more testing manually on my own machine than was
necessary in the past.


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


[isabelle-dev] Broken AFP

2016-07-16 Thread Manuel Eberl

Hallo,

sorry for the broken AFP – it's my fault, but I added a lot of new 
material about rings and prime factor decomposition. The projects that 
were affected by this the most are already fixed, and I will take care 
of the rest on Monday.


Manuel
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] \nexists

2016-07-16 Thread Tobias Nipkow



On 15/07/2016 20:33, Makarius wrote:

On 15/07/16 17:08, Tobias Nipkow wrote:

For me a default is something that a) is beneficial for the majority of
users and b) can be overwritten if you don't like it. Isn't that
possible here?


That is the existing default of not insisting in Latex documents in
every single point of history. When there is something wrong with Latex,
it is usually easy to repair afterwards.

Asking everybody to test Latex every time imposes a burden with little
benefit.

In contrast, a full "build -a" for the formal content is vital. The
shorter that takes, the more it becomes second nature to do it
frequently -- I often do it for every single changeset (before the local
commit).


Your notion of what is good for developers differs from their own notion, at 
least for the ones I talked to. But it turns out you are right: combining -a and 
-o document=pdf is not a good idea, it breaks Logics_ZF.


Tobias



The current Isabelle Jenkins setup has changed focus slightly. There are
more continuously built "artifacts", like documents, but important
"telemetry" data visualization is missing. So we are flying blind
concerning performance figures, not just for AFP (as we do for many
years), but also for the main repository.

What is really needed now, is a replacement for
Admin/isatest/isatest-statistics.

That also depends on good physical measurements: that was historically
done without Latex getting in between.


Makarius





smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


[isabelle-dev] NEWS: proof outline with cases

2016-07-16 Thread Makarius
*** Prover IDE -- Isabelle/Scala/jEdit ***

* Command 'proof' provides information about proof outline with cases,
e.g. for proof methods "cases", "induct", "goal_cases".


This refers to Isabelle/6c46a1e786da. The main functionality is in
Isabelle/9f8d06f23c09, which also demonstrates how to do such things
with existing PIDE infrastructure.


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev