> On Jan. 25, 2013, 4:13 p.m., Nathan Binkert wrote: > > Seems like overkill to me. If you do this, then you can't do any math > > using SQL and you have to suck out values to do anything. If that's the > > attitude, why even bother using sqlite at all? > > Ali Saidi wrote: > You can't do math in sql, but that probably wasn't what you wanted to do > anyway. You probably want to suck the data back in the python class hierarchy > and manipulate it there. I think the ideal situation would be to pickle the > objects and not use sql, however that was much slower. The slowest (and > largest) was having a sql table of stat,x,y,value columns which meant reading > a large array took forever. > > Nathan Binkert wrote: > Interesting. When I was doing tons of sampling, doing the math in SQL > was exactly what I wanted to do because I could do queries in moments > compared to loading several gigabytes of data and then processing it. All of > the context stuff and the stuff in util/stats/db.py was to do that. The nice > thing about the database is that you can build up a very large database of > stats across many experiments that have many samples, and with SQL, you can > really quickly query those stats. If you're just trying to have something be > a binary format, you may as well just serialize as json (or msgpack) and gzip > the whole file. I, personally, found the SQL thing to be awesome. I could > regenerate complex graphs in moments. (Not to mention the fact that SQL > actually implements tons of useful operations.)
The binary data stored in SQL is a sensible middle ground at this point as you can avoid the scenario you describe of having to unzip/unserialize the whole file, and can simply get the data you need through queries. Then you will indeed have to unzip/unserialize those bits before you can manipulate them, but the benefit is that the size of the database is manageable. We tried a range of options and this seemed like a sensible starting point. If someone wants to extend or modify it going forward that is of course very welcome. - Andreas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/1646/#review3936 ----------------------------------------------------------- On Jan. 15, 2013, 10:36 a.m., Andreas Hansson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/1646/ > ----------------------------------------------------------- > > (Updated Jan. 15, 2013, 10:36 a.m.) > > > Review request for Default. > > > Description > ------- > > Changeset 9499:bc23f2c316fc > --------------------------- > stats: Store vector stats using doubles and compress with zlib > > This patch changes any arrays of values to be stored as an array of doubles, > rather than floats in the SQL database. This is required as floats lose too > much > accuracy. For example, if the stats are read from the database, and injected > back into gem5's stats system, then formulas can be recalculated. If floats > are > used, these formulas evaluate to be different from those originally calculated > when creating the SQL database. > > As doubles take up twice the space of a float (8 Bytes vs 4 Bytes) the SQL > database becomes larger. The end result is that the database is larger than > the > text based output without compression. Therefore, as the vector storage is > already not human readable we compress this field using zlib. zlib has been in > the python standard library since version 1.5.1. so it is already covered in > the gem5 build prerequisites. > > > Diffs > ----- > > src/python/m5/stats/sql.py PRE-CREATION > > Diff: http://reviews.gem5.org/r/1646/diff/ > > > Testing > ------- > > > Thanks, > > Andreas Hansson > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
