> On Jan. 25, 2013, 9:58 a.m., Nilay Vaish wrote:
> > A double is 8 bytes, and each character in a text-based output is 
> > probably >= 1 byte, depending on the encoding. If the double value
> > actually holds less than 8 characters, I am surprised that a
> > float value does not suffice. What other info does the database 
> > include that is increasing its size?

The reason for switching from float to double was due to inaccuracies when 
formulas were recalculated.

I wrote a script which takes the stats from the SQL database and injects them 
back into the gem5 python stats system. This allowed me to generate a 
text-based stats file and an SQLite database for a gem5 run, then inject the 
data back into the stats system and re-generate the text-based output to ensure 
that the stats were being stored and retrieved correctly, i.e. that the 
original stats.txt matched the one generated from the SQLite database. When 
floats were used to store the data in the database, some of the formulas 
evaluated to significantly different results as some of the accuracy was lost 
when storing. This issue was resolved when changing the storage to double as 
python's "float" is actually 64 bits (on most architectures/python 
implementations).

However, in order to minimise the number of database accesses, vector stats 
(vector, vector2d and formulas) are stored as binary blobs in the database, 
thereby storing all elements of the vector in one field in the database. 
However, this has the side effect that if you have, for example, a vector of 
length 10 with one actual value and nine NaNs, you still have to store the 
NaNs. Naturally, if you then double the space to store each value (including 
the NaNs) the database becomes very large.

In my view there are two alternatives to the approach in the patch:

1. Store each element for a vector in a separate table, and "reconstruct" the 
vector when we want the values. This has two side effects. First of all, each 
access requires multiple database access, or complex joining of tables which 
will increase the access time. Secondly, if each element is stored by itself it 
also need to be stored with the ID of the stat it belongs to, the index of the 
dump it belongs to and its position within the vector. This potentially 
requires more space to store than the approach in this patch. That said, it 
would allow only specific elements of the vector to be pulled from the database.

2. Manually pack the data into the blob field. We could only store the data 
which is non-NaN by manually packing the data so that we store <index within 
vector><value as double>. This has the advantage of only storing the data we 
care about (although we have the additional overhead of storing the index 
within the vector) and we could pull this data out with one database access. 
However, we do then have the overhead of packing and unpacking the data which 
is potentially very slow and time consuming.

Personally I don't think that any of these solutions are ideal, but I think 
that the solution in the patch presents a fairly foolproof way of storing the 
data. Of course, I am more than open to suggestions, but I think it will always 
be a trade-off between elegance, size, speed and accuracy.


- Sascha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1646/#review3914
-----------------------------------------------------------


On Jan. 15, 2013, 10:36 a.m., Andreas Hansson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1646/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2013, 10:36 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> Changeset 9499:bc23f2c316fc
> ---------------------------
> stats: Store vector stats using doubles and compress with zlib
> 
> This patch changes any arrays of values to be stored as an array of doubles,
> rather than floats in the SQL database. This is required as floats lose too 
> much
> accuracy. For example, if the stats are read from the database, and injected
> back into gem5's stats system, then formulas can be recalculated. If floats 
> are
> used, these formulas evaluate to be different from those originally calculated
> when creating the SQL database.
> 
> As doubles take up twice the space of a float (8 Bytes vs 4 Bytes) the SQL
> database becomes larger. The end result is that the database is larger than 
> the
> text based output without compression. Therefore, as the vector storage is
> already not human readable we compress this field using zlib. zlib has been in
> the python standard library since version 1.5.1. so it is already covered in
> the gem5 build prerequisites.
> 
> 
> Diffs
> -----
> 
>   src/python/m5/stats/sql.py PRE-CREATION 
> 
> Diff: http://reviews.gem5.org/r/1646/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andreas Hansson
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to