Re: [gem5-dev] Review Request: stats: Store vector stats using doubles and compress with zlib

Nathan Binkert Tue, 05 Feb 2013 15:41:25 -0800


> On Jan. 25, 2013, 4:13 p.m., Nathan Binkert wrote:
> > Seems like overkill to me.  If you do this, then you can't do any math 
> > using SQL and you have to suck out values to do anything.  If that's the 
> > attitude, why even bother using sqlite at all?
> 
> Ali Saidi wrote:
>     You can't do math in sql, but that probably wasn't what you wanted to do 
> anyway. You probably want to suck the data back in the python class hierarchy 
> and manipulate it there. I think the ideal situation would be to pickle the 
> objects and not use sql, however that was much slower. The slowest (and 
> largest) was having a sql table of stat,x,y,value columns which meant reading 
> a large array took forever.
> 
> Nathan Binkert wrote:
>     Interesting.  When I was doing tons of sampling, doing the math in SQL 
> was exactly what I wanted to do because I could do queries in moments 
> compared to loading several gigabytes of data and then processing it.  All of 
> the context stuff and the stuff in util/stats/db.py was to do that.  The nice 
> thing about the database is that you can build up a very large database of 
> stats across many experiments that have many samples, and with SQL, you can 
> really quickly query those stats.  If you're just trying to have something be 
> a binary format, you may as well just serialize as json (or msgpack) and gzip 
> the whole file.  I, personally, found the SQL thing to be awesome.  I could 
> regenerate complex graphs in moments.  (Not to mention the fact that SQL 
> actually implements tons of useful operations.)
> 
> Andreas Hansson wrote:
>     The binary data stored in SQL is a sensible middle ground at this point 
> as you can avoid the scenario you describe of having to unzip/unserialize the 
> whole file, and can simply get the data you need through queries. Then you 
> will indeed have to unzip/unserialize those bits before you can manipulate 
> them, but the benefit is that the size of the database is manageable.
>     
>     We tried a range of options and this seemed like a sensible starting 
> point. If someone wants to extend or modify it going forward that is of 
> course very welcome.


Personally, I don't think storing blobs in sqlite is particularly sensible.  It 
is seriously limiting.  If sqlite is to be the canonical storage format, it 
seems that it should be simple and obvious.  If you have particular 
storage/speed issues, then a secondary implementation might make sense (but 
don't call it sql since you're just using it for storage, not for SQL).  Then 
again, if you want to store blobs, why are you using sqlite at all? Why not use 
dbm?

Is the problem space or speed?  If the problem is speed, what operations are 
you doing?  If you're simply converting to text, then I'd say that's not a 
useful benchmark.


- Nathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1646/#review3936
-----------------------------------------------------------


On Jan. 15, 2013, 10:36 a.m., Andreas Hansson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1646/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2013, 10:36 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> Changeset 9499:bc23f2c316fc
> ---------------------------
> stats: Store vector stats using doubles and compress with zlib
> 
> This patch changes any arrays of values to be stored as an array of doubles,
> rather than floats in the SQL database. This is required as floats lose too 
> much
> accuracy. For example, if the stats are read from the database, and injected
> back into gem5's stats system, then formulas can be recalculated. If floats 
> are
> used, these formulas evaluate to be different from those originally calculated
> when creating the SQL database.
> 
> As doubles take up twice the space of a float (8 Bytes vs 4 Bytes) the SQL
> database becomes larger. The end result is that the database is larger than 
> the
> text based output without compression. Therefore, as the vector storage is
> already not human readable we compress this field using zlib. zlib has been in
> the python standard library since version 1.5.1. so it is already covered in
> the gem5 build prerequisites.
> 
> 
> Diffs
> -----
> 
>   src/python/m5/stats/sql.py PRE-CREATION 
> 
> Diff: http://reviews.gem5.org/r/1646/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andreas Hansson
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request: stats: Store vector stats using doubles and compress with zlib

Reply via email to