I run across that situation (#2) rather often as well in testing. Since it's an OOM situation I think the only safe thing to do would be to somehow ask a job to abort (assuming that doesn't require any more pages?...). I'm not sure how the buffer cache would communicate that information up to the correct layer though.
- Ian On Mon, May 18, 2015 at 2:35 PM, Mike Carey <[email protected]> wrote: > Regarding 2, this should not happen in production, obviously, but it > would be nice to catch it when it happens (needing a victim, can't find > one) and throw up our hands rather than hanging. > > On 5/18/15 10:02 AM, [email protected] wrote: > > > Comment #3 on issue 884 by [email protected]: Unexpected exceptions when > the FrameSize and the PageSize is different > https://code.google.com/p/asterixdb/issues/detail?id=884 > > I investigated this issue and found why this situation occurs. > There are two problems behind this situation. > > 1. When an in-memory component is about to be scheduled to be flushed, > the component's writer counter can be greater than 0 based on the current > code base/design. > So, this should be considered as a valid situation (rather than an > exception) and > dealt with appropriately, i.e., the component must not be scheduled to be > flushed. > > 2. The number of buffer pages are not enough if the frame size is set to > 4096 and the > buffer cache page size is set to 327680 in default > asterix-build-configuration.xml. > If the configuration file is set in this way, there are at most 11 pages > in buffer cache, > so buffer cache can not find victim pages since all pages are used. Thus, > the test > is hung. > So, the buffer cache size (not the buffer cache page size) should be > increased > appropriately to avoid this situation. > > I have a fix for problem 1 and I will send a code review after I add > comments to the fixed > code such as why the situation can occur with a detail example scenario. > > Lastly, this issue is different from the situation described in issue 878 > even though both issue came from LSMHarness layer. > > > -- > You received this message because you are subscribed to the Google Groups > "asterixdb-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. >
