Hi,

We wrote an application to store tick data in HDF5 files.

We have 8 dumpcap ( wireshark) instances capturing multicast tick data and
storing a pcap file for 5 minutes.

We wrote a c program which reads a pcap file and store data in HDF5 file. We
are creating one HDF5 file for a day.
We creating three data-set for an instrument using packet table API.

We are using following compound datatypes for these tables.

typedef struct tick_t {
        int64_t         bid_trade_value;
        int64_t         ask_value;
        int32_t         cap_sec;
        int32_t         cap_usec;
        int32_t                 exg_sec;
        uint32_t        exg_seq_id;
        uint32_t        tfp_seq_id;
        uint32_t        lrt_id;
        uint32_t        bid_trade_size;
        uint32_t        ask_size;
        uint32_t        flags2;
        uint16_t        flags;
        uint8_t                 bid_trade_base;
        uint8_t                 ask_base;
        uint8_t                 bid_trade_exg;
        uint8_t                 ask_exg;
} Tick;

typedef struct lrt_t {
        hvl_t           lrt;
} Lrt;

typedef struct index_t {
        int64_t                 high_value;
        int64_t                 low_value;
        int64_t                 open_value;
        int64_t                 close_value;
        int32_t                 minute;
        uint32_t                tick_start;
        uint32_t                tick_end;
        uint8_t                 high_base;
        uint8_t                 low_base;
        uint8_t                 open_base;
        uint8_t                 close_base;
} Index;

And resultant HDF5 file stats are as follows.

fut_mc1_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/fut_mc1_20121011.txt>  
fut_mc2_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/fut_mc2_20121011.txt>  
fut_mc3_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/fut_mc3_20121011.txt>  
onl_mc2_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/onl_mc2_20121011.txt>  
onl_mc3_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/onl_mc3_20121011.txt>  
onl_mc4_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/onl_mc4_20121011.txt>  
onl_mc5_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/onl_mc5_20121011.txt>  
onl_mc6_20121011.txt
<http://hdf-forum.184993.n3.nabble.com/file/n4025530/onl_mc6_20121011.txt>  

We are running 8 instances of this c program, which opens the HDF5 file and
writes the data.
Max size of a pcap file is 300 MB. 

The problem is, processing a 5 minute pcap file and storing data in HDF5
taking more than 5 minutes ( some times 30 minutes).

>From using timing the functions, I see that the bottleneck is HDF5 file
writing. For each instrument i am creating a group
 and creating three data-sets in it. My file writing code is as follows.

        hid_t group_id;

        if(!H5Lexists( file_id, symbol, H5P_DEFAULT ) )
        {
                group_id = H5Gcreate( file_id, symbol, H5P_DEFAULT, H5P_DEFAULT,
H5P_DEFAULT );
        }
        else
        {
                group_id = H5Gopen( file_id, symbol, H5P_DEFAULT );
        }
        if(!H5Lexists( group_id, "ticks", H5P_DEFAULT ) )
        {
                hid_t tick_type = H5Topen (file_id, "tick_type", H5P_DEFAULT);
                hid_t ptable = H5PTcreate_fl(group_id, "ticks", tick_type, 128, 
-1);
                herr_t err = H5PTappend(ptable, tick_len, tick_buf );
                err = H5PTclose(ptable);
        }
        else
        {
                hid_t ptable =H5PTopen(group_id, "ticks");
                herr_t err = H5PTappend(ptable, tick_len, tick_buf );
                err = H5PTclose(ptable);
        }

        H5Gclose(group_id);


I need achieve this with in 5 minutes.

1. How can I make HDF5 writing faster?

2. Is large number of data-sets is a problem?

3. I am using chunk size 100. Can anybody suggest more appropriate size
seeing above stats?

4. If I use low level data-set API instead of packet table, will get write
performance improvement?





--
View this message in context: 
http://hdf-forum.184993.n3.nabble.com/writing-large-number-of-data-sets-tp4025530.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to