Hi, Tobias,

Thanks for your interest in FastBit.  Based on your description, my 
suggestion is that you go with option 1.

John


On 1/18/2011 10:20 PM, emsfeld wrote:
> My apologies for resending. I just noticed that the formatting of the
> data in my previous mail isnt too easy on the eye, so I fixed that:
>
> Hi everyone,
>
> I've just recently become aware of FastBit and have been toying around
> with it for a while. I am very impressed with its performance on large
> time series data sets.
>
> I have a question though in regards to as to how to structure my data in
> order to achieve best performance. Basically my data is to a very large
> extent homogeneous. I'd say ~90% of the entire set would span the same
> number of columns and represents High Frequency market order book data/
> messages. These messages either come in the form of depth or trade
> snapshots with the former comprising 90% of the dataset.
>
> For instances, a typical depth record would look like,
>
> ColumnNames:  Values:
> TimeStamp     20100930063000100
> TradingCode   ESH0
> BidLevel1QTY  10
> BidLevel1Price        1400
> ...   
> AskLevel1QTY  13
> AskLevel1Price        1402
> …     
>
>
> A typical trade record would look like,
>
> ColumnNames:  Values:
> TimeStamp     20100930063000100
> TradingCode   ESH0
> PriceQTY      10
> Price         1402
>
>
> I believe that gives me several possibilities to structure my set, ie:
> 1. Use a flat format similar to this:
>
> ColumnNames:    Values(Depth):        Values(Trade):
> TimeStamp     20100930063000100       20100930063000100
> TradingCode   ESH0                    ESH0
> TradeOrDepth  D                       T
> PriceQty      0                       10
> Price 0       1402
> BidLevel1QTY  10                      0
> BidLevel1Price        1400                    0
> ...                           
> AskLevel1QTY  13                      0
> AskLevel1Price        1402                    0
> …             
>
>
> 2. Another option would be to use a relational format with indexes, ie
>
> Table 1:
> ColumnNames:  Values:                 Values:
> UniqueID      100                     101
> TimeStamp     20100930063000100       20100930063000100
> TradeOrDepth  D                       T
>
> Table 2 (DepthsTrades table):
> ColumnNames:  Values: Values:
> ForeignID     100     101
> TradeQty      0       10
> TradePrice    0       1402
> BidPriceQty   10      0
> BidPrice      1400    0
> AskPriceQty   13      0
> AskPrice      1402    0
> Level         1       0
>
>
> With  option 1 I'd have quite a few redundancies, especially when the
> data comprises more than one level for a single timestamp. Up to 10
> levels on both sides would be normal. Since it'd affect 10% of the data
> only it wouldn't be too bad though.
>
> With Option 2 I wouldn't have as many redundancies, but would have to
> join tables when querying for anything other than date_time ranges.
> Would that be a more (time) costly operation relative to option 1? I am
> pretty much mostly interested in getting the best retrieval speeds, plus
> from what I understand, FastBit's compression would be handling a bunch
> of zeros rather well storage size wise.
>
> I'd appreciate your thoughts.
>
> Regards
> Tobias
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to