On Thu, Apr 4, 2013 at 9:35 PM, David MENTRÉ <dmen...@linux-france.org>wrote:

> Hello Maxime,
>
> Le 04/04/2013 20:45, Maxime Petazzoni a écrit :
>
>  Isn't postgresql vacuum process supposed to do that? Do we need a
>>> >cleanup process at application level?
>>>
>> Vacuum takes an exclusive lock on the database and takes (interestingly)
>> longer than doing a full planet import.
>>
>
> Yes, that's strange! Those database issues are out of my knowledge but I
> nonetheless think this vacuum behaviour should not occur.


At the very least I think it's an interesting test case to bring the
Postres/PostGIS guys in on, we might have stumbled onto an edge case in
vacuum performance.


 >Doing an import on a regular basis is not sustainable.
>>
> The import is actually pretty quick on SSD. But yeah, it's a bit
> annoying. I was actually wondering if we should simply do a full import
> every day and swap (with minutely updates in between).
>

So we have enough SSD space for two full imports but not enough for one
> full import plus several weeks/months/... of update? Therefore we clearly
> have an issue with vacuum. Maybe we need to trigger it more frequently, let
> it do its jobs by sopping updates for a while, give it more priority, etc.
>
> But if you prefer doing a full import each 12h or 24h, that's fine for me.
> MapOSMatic service is still useful with such delay


I was also thinking along these lines. If the growth is because of the lack
of vacuuming and the vacuuming takes this long because it's never been done
after the initial import (I know full import disables it), it may very well
be that suspending renders and minutely updates every hour and triggering a
vacuum, it'll bring the database size down in minutes flat.

So, hypothetical, every hour on the hour:
- Suspend update and render queues
- Wait for current render if any and/or current update if any to finish
- Vacuum
- Catch up on minutely updates, should be a few if vacuum runs each hour
- Log diskspace taken by database*
- Wake update and render queues

* Every so often compare if it still corresponds to the size expected from
a full import. This'll help determine if vacuum is broken and also silently
filling the SSD isn't a problem.

This might be preferable to a double buffering approach, what with planet
dump size ever increasing.

A hybrid approach could be the setting up of several databases, one for
each continent. Smaller, still no problem with relations on boundaries,
quicker to vacuum and update. That is of course if someone has preprocessed
minutely updates at that scale...

Having said this, if there's enough room for two full planet imports plus a
day's worth of updates and room to grow before the expected lifetime of the
SSD runs out, the obviously Maxime's proposed solution is a heck of a lot
easier to implement.

-- 
↑↑↓↓←→←→BA[Start]

Reply via email to