Hi everyone,

I store financial instruments intraday market data in hdf5 files.
Recently I decided to change the format thinking that the previous
format was space wasting.  New format should be taking less space but
its twice bigger in size. Following is the layout

OldFormat
---------------
SECURITY1
    - QUOTES
    - TRADES

In Old format, each group is the name of the security. Each group
consists of 2 datasets, quotes and trades. Quotes is a 2D array where
the number of columns is 4 * depth of order book + 1 for timestamp
So for Australia is 100 columns since we have 20 levels of market data.

Dataset looks as follows

Timestamp Bid0 Ask0 Bidsize0 Asksize0 Bid1 Ask1 Bidsize1 Asksize1 Bid2
Ask2 Bidsize2 Asksize2 Bid3 Ask3 Bidsize3 Asksize3 .... etc

In this case .. now even if any single value changes .. I used to write
one new row (even for a single change). I thought this was very
inefficient way. So I changed the format as follows:

New Format
------------------

SECURITY1
   -LEVEL1
   -LEVEL2
   -LEVEL3
   -LEVEL4
   -LEVEL5
   .
   .
   .
   -LEVEL20
   -TRADES

Now, depending on which level is updated, I add row in corresponding
dataset only. This means I have much less data compared to my previous
data format.

But my output hdf5 file is twice in size using the new format. 

They both use same compression logic

Repack.exe -f GZIP=5 -l CHUNK=2056x1 sourcefile targetfile.

I am not sure why would the file size be larger since I write less data
now. Is it because I have large number of datasets? What is it that I am
missing here? 

Any suggestions?

Regards,

Alok Jadhav
CREDIT SUISSE AG
GAT IT Hong Kong, KVAG 67
International Commerce Centre | Hong Kong | Hong Kong
Phone +852 2101 6274 | Mobile +852 9169 7172
[email protected] | www.credit-suisse.com


=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to