Hi, Tobias, Thanks for your interest in FastBit. Based on your description, my suggestion is that you go with option 1.
John On 1/18/2011 10:20 PM, emsfeld wrote: > My apologies for resending. I just noticed that the formatting of the > data in my previous mail isnt too easy on the eye, so I fixed that: > > Hi everyone, > > I've just recently become aware of FastBit and have been toying around > with it for a while. I am very impressed with its performance on large > time series data sets. > > I have a question though in regards to as to how to structure my data in > order to achieve best performance. Basically my data is to a very large > extent homogeneous. I'd say ~90% of the entire set would span the same > number of columns and represents High Frequency market order book data/ > messages. These messages either come in the form of depth or trade > snapshots with the former comprising 90% of the dataset. > > For instances, a typical depth record would look like, > > ColumnNames: Values: > TimeStamp 20100930063000100 > TradingCode ESH0 > BidLevel1QTY 10 > BidLevel1Price 1400 > ... > AskLevel1QTY 13 > AskLevel1Price 1402 > … > > > A typical trade record would look like, > > ColumnNames: Values: > TimeStamp 20100930063000100 > TradingCode ESH0 > PriceQTY 10 > Price 1402 > > > I believe that gives me several possibilities to structure my set, ie: > 1. Use a flat format similar to this: > > ColumnNames: Values(Depth): Values(Trade): > TimeStamp 20100930063000100 20100930063000100 > TradingCode ESH0 ESH0 > TradeOrDepth D T > PriceQty 0 10 > Price 0 1402 > BidLevel1QTY 10 0 > BidLevel1Price 1400 0 > ... > AskLevel1QTY 13 0 > AskLevel1Price 1402 0 > … > > > 2. Another option would be to use a relational format with indexes, ie > > Table 1: > ColumnNames: Values: Values: > UniqueID 100 101 > TimeStamp 20100930063000100 20100930063000100 > TradeOrDepth D T > > Table 2 (DepthsTrades table): > ColumnNames: Values: Values: > ForeignID 100 101 > TradeQty 0 10 > TradePrice 0 1402 > BidPriceQty 10 0 > BidPrice 1400 0 > AskPriceQty 13 0 > AskPrice 1402 0 > Level 1 0 > > > With option 1 I'd have quite a few redundancies, especially when the > data comprises more than one level for a single timestamp. Up to 10 > levels on both sides would be normal. Since it'd affect 10% of the data > only it wouldn't be too bad though. > > With Option 2 I wouldn't have as many redundancies, but would have to > join tables when querying for anything other than date_time ranges. > Would that be a more (time) costly operation relative to option 1? I am > pretty much mostly interested in getting the best retrieval speeds, plus > from what I understand, FastBit's compression would be handling a bunch > of zeros rather well storage size wise. > > I'd appreciate your thoughts. > > Regards > Tobias > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
