Eric Marsden wrote:
> I suspect that this is because your application is overflowing the
> available static space, then corrupting dynamic space.
>
>
> I added assertions to purify.c to check for overflow of the static
> space during purification, and your purify-test.lisp that previously
> caused the same symptoms as you have been seeing now trigger these
> assertions, inside ptrans_vector. My testing has been on x86/Linux;
> I'll check on Solaris tomorrow.
>
> CMUCL is unfortunately not able to resize these spaces dynamically, so
> the only solutions I see for your problem are:
>
> (1) you build a customized version of CMUCL with a sufficiently large
> static space. I don't think it would be reasonable to increase
> the default sizes much, because doing so would reduce the
> available dynamic space, which wouldn't be good for most users.
>
> (2) we find some way of disabling purification for certain data types
> (apparently code objects in your case), so that you can use a
> normal CMUCL build with partial purification.
>
> (3) we find the bug that prevents restarting of unpurified images (I
> think that this has been fixed in SBCL). Unpurified images will
> be noticeably slower, though.
Several problems are involved here. Lets review where things are. From
the summary message a while back there were problems #1 #2 and #3.
I will summarize my belief on the current state of each.
--------
1. Can't do save-lisp :purify nil of application. Failure mode is that
the restarted app gets div by 0 error. This worked in 18e, 2003_12.
Does not work in 2004_04.
(3) above sounds like this might be fixed. Great!
-------
2. Can't do save-lisp :purify t of application. Failure mode is core
dump. This has never worked.
"We" have tried several experiments with experimental lisps. Results
are confusing. However, I have just been able to apparently
duplicate the problem outside the context of our application. I sent
a message on this earlier today. I don't think this problem is
related to size of static space. Space at the time of the purify
is < 128M. There is some indication that the problem is more
related to traversing large data structures during purification -- how
much you move or the complexity of what you move during a
given purify rather than the total amount that has been moved to
static space.
--------
3. Can't do purify on a consed up dummy application when heap
is > 128M. Failure mode is seg fault. I think the belief here
is that a wall is being hit with respect to the size of static space.
This problem is a little bit of a red
herring since the real application never has this problem.
A solution to #3 could be to declare this to be a known
limitation and error out in a more graceful way. It would also
be nice if the 128M wall could be moved further out, but assuming
that the :purify nil problem is fixed (#1), there is a solution for saving
large
applications that would hit the wall. Sounds like some work has
already been done on this solution.
So summing up, it looks like two of the three problems are on
the way to having a solution. The remaining problem, #2, has
been a tough nut to crack but perhaps the new ability to duplicate
it outside of our application will provide new insight.
-bill-