> On Jan. 25, 2013, 9:58 a.m., Nilay Vaish wrote:
> > A double is 8 bytes, and each character in a text-based output is 
> > probably >= 1 byte, depending on the encoding. If the double value
> > actually holds less than 8 characters, I am surprised that a
> > float value does not suffice. What other info does the database 
> > include that is increasing its size?
> 
> Sascha Bischoff wrote:
>     The reason for switching from float to double was due to inaccuracies 
> when formulas were recalculated.
>     
>     I wrote a script which takes the stats from the SQL database and injects 
> them back into the gem5 python stats system. This allowed me to generate a 
> text-based stats file and an SQLite database for a gem5 run, then inject the 
> data back into the stats system and re-generate the text-based output to 
> ensure that the stats were being stored and retrieved correctly, i.e. that 
> the original stats.txt matched the one generated from the SQLite database. 
> When floats were used to store the data in the database, some of the formulas 
> evaluated to significantly different results as some of the accuracy was lost 
> when storing. This issue was resolved when changing the storage to double as 
> python's "float" is actually 64 bits (on most architectures/python 
> implementations).
>     
>     However, in order to minimise the number of database accesses, vector 
> stats (vector, vector2d and formulas) are stored as binary blobs in the 
> database, thereby storing all elements of the vector in one field in the 
> database. However, this has the side effect that if you have, for example, a 
> vector of length 10 with one actual value and nine NaNs, you still have to 
> store the NaNs. Naturally, if you then double the space to store each value 
> (including the NaNs) the database becomes very large.
>     
>     In my view there are two alternatives to the approach in the patch:
>     
>     1. Store each element for a vector in a separate table, and "reconstruct" 
> the vector when we want the values. This has two side effects. First of all, 
> each access requires multiple database access, or complex joining of tables 
> which will increase the access time. Secondly, if each element is stored by 
> itself it also need to be stored with the ID of the stat it belongs to, the 
> index of the dump it belongs to and its position within the vector. This 
> potentially requires more space to store than the approach in this patch. 
> That said, it would allow only specific elements of the vector to be pulled 
> from the database.
>     
>     2. Manually pack the data into the blob field. We could only store the 
> data which is non-NaN by manually packing the data so that we store <index 
> within vector><value as double>. This has the advantage of only storing the 
> data we care about (although we have the additional overhead of storing the 
> index within the vector) and we could pull this data out with one database 
> access. However, we do then have the overhead of packing and unpacking the 
> data which is potentially very slow and time consuming.
>     
>     Personally I don't think that any of these solutions are ideal, but I 
> think that the solution in the patch presents a fairly foolproof way of 
> storing the data. Of course, I am more than open to suggestions, but I think 
> it will always be a trade-off between elegance, size, speed and accuracy.

The second approach is what I would personally prefer. It is pretty common to
store sparse matrices / vectors that way. Note that even compression is 'slow
and time consuming'. But I'll let you decide the approach you want to take.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1646/#review3914
-----------------------------------------------------------


On Jan. 15, 2013, 10:36 a.m., Andreas Hansson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1646/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2013, 10:36 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> Changeset 9499:bc23f2c316fc
> ---------------------------
> stats: Store vector stats using doubles and compress with zlib
> 
> This patch changes any arrays of values to be stored as an array of doubles,
> rather than floats in the SQL database. This is required as floats lose too 
> much
> accuracy. For example, if the stats are read from the database, and injected
> back into gem5's stats system, then formulas can be recalculated. If floats 
> are
> used, these formulas evaluate to be different from those originally calculated
> when creating the SQL database.
> 
> As doubles take up twice the space of a float (8 Bytes vs 4 Bytes) the SQL
> database becomes larger. The end result is that the database is larger than 
> the
> text based output without compression. Therefore, as the vector storage is
> already not human readable we compress this field using zlib. zlib has been in
> the python standard library since version 1.5.1. so it is already covered in
> the gem5 build prerequisites.
> 
> 
> Diffs
> -----
> 
>   src/python/m5/stats/sql.py PRE-CREATION 
> 
> Diff: http://reviews.gem5.org/r/1646/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andreas Hansson
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to