Re: [Geoserver-devel] monitoring and archiving request data

Andrea Aime Mon, 30 Aug 2010 00:38:45 -0700

Arne Kepp ha scritto:
>  On 8/30/10 8:52 AM, Andrea Aime wrote:
>> Justin Deoliveira ha scritto:
>>> Hi all,
>>>
>>> I waned to poll the list to do some brainstorming about how to go about
>>> archiving request data information in the monitoring extension.
>>>
>>> As it stands now the extension simply persists request data (via
>>> hibernate) into a single table called "REQUEST". You can imagine that a
>>> busy server will fill this table up quite quickly eventually getting to
>>> the point where operations start to perform poorly. So some sort of
>>> strategy to clear out the main table and possibly archive existing
>>> information will be needed.
>>>
>>> My practical database admin experience is quite limited so i am very
>>> eager to hear thoughts on this one. But that said I can think of a
>>> couple of different ways to skin this cat.
>>>
>>> 1) no archiving
>>>
>>> Sort of lame, but leave all facilities for archiving to the db admin.
>>> The main downside of this is that it makes it impossible for the
>>> monitoring extension to retrieve historical request information for
>>> reporting purposes. So we would basically saying you can only do your
>>> own analysis on your achieved data.
>>>
>>> 2) automated archiving
>>>
>>> Flipping the coin would be to attempt to archive request data
>>> transparently. I tried to google for information about people using
>>> hibernate that need to archive in this manner but could not come up with
>>> much. One thought I had was to come up with a second mapping, creating a
>>> table "REQUEST_HISTORY". And periodically move data from one table to
>>> the other. However i am not sure how feasible this is strictly going
>>> through hibernate. It seems kind of clunky and could be problematic to
>>> try and synchronize, trying to archive while the server is still under a
>>> load. I guess we could only support an "offline" archiving...
>>>
>>> Another thought I had that is less hibernate centric would be to use
>>> triggers. Basically duplicate the REQUEST table creating REQUEST_HISTORY
>>> and set up a trigger on the main table so that any insert results in a
>>> duplicate row in the history table. Then an "archive" really just
>>> becomes clearing out the main request table. Obviously this would not be
>>> possible with straight hibernate so would require some homegrown jdbc
>>> stuff. Which gets tricky when you try to support different databases,
>>> messing with dialects is sort of a messy business.
>>>
>>> 3) hybrid
>>>
>>> The third solution would sort of be a hybrid. Basically we would not
>>> provide any archiving facilities leaving it up to the admin. But we
>>> would support querying a REQUEST_HISTORY table (if it exists) in order
>>> to retrive the historical information.
>>>
>>> Thats all I got. Any feedback on this one would be much appreciated.
>> I would go for the second, possibly dodging Hibernate, which is normally
>> not well suited for mass data transfers (at least, it was not last time
>> I checked). The following:
>>
>> -- start transaction --
>> INSERT INTO REQUEST_HISTORY SELECT * FROM REQUEST WHERE DATE>  ?
>> DELETE FROM REQUEST WHERE DATE>  ?
>> -- commit transaction --
>>
>> should be well supported among databases + plain jdbc without the need
>> for a translation layer in the middle.
>>
>> Cheers
>> Andrea
>>
> I know that one big online advertiser just renames the old table, 
> because it's a lot faster than copying millions of rows. A separate 
> process then examines the history table and calculates aggregate values 
> for the hits they don't need to store forever.
> 
> Unfortunately renaming is an extension of the ANSI SQL standard, but I 
> think it's pretty widely supported.


Ah. This approach works if the target name contains the month or the
year, meaning you get multiple archive tables that have then to
be unioned somehow when one needs to access data crossing the border
between one archive and the next.

Fully agree about computing more compact aggregates btw

Cheers
Andrea

-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Geoserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Re: [Geoserver-devel] monitoring and archiving request data

Reply via email to