[This message was posted by Dimitry  London of Morgan Stanley <[EMAIL 
PROTECTED]> to the "FAST Protocol" discussion forum at 
http://fixprotocol.org/discuss/46. You can reply to it on-line at 
http://fixprotocol.org/discuss/read/46eb6ef0 - PLEASE DO NOT REPLY BY MAIL.]

I made some additional performance improvements by elliminating unnecessary 
memcpys, and while I did see the marginal improvement in latency, but in terms 
of CPU the FAST decoder is 20% while NON-Fast is never above 13%, and is mostly 
10% at the rate of 50 messages/per second. I can probably squeeze a bit more - 
but I think my implementation is almost optimal.  

I don't know yet what happens at the higher message rates but are the CPU stats 
expected? In more general terms, what is the FAST expectation in terms of CPU 
compared to the similar non-FAST implementation? is 7-10% CPU increase expected?

For example, does anyone have similar stats of decoding CME non-FAST feed vs 
CME FAST feed?

Thanks,
Dimitry

> Thanks, Anders. The issue is that a process parsing an ascii data
> stream and creating strings is taking a lot less CPU than a FAST
> decoder parsing a much smaller data stream and using mostly native (non-
> string types).
> 
> Here are some specifics:
> 
> 1. The level2 data contains for 4 market data entries with Price, Time,
>    and Size specified for each entry, with the first entry containing 2
>    strings for tick and market. Thus, the total number of fields is 14,
>    with 2 strings and 12 native types.
> 
> 2. non-FAST decoder receives an ascii stream containing the above data
>    in a proprietary string format, parses it, and copies 14 strings
>    using for all the values. The length of the ascii message is about
>    200 bytes.
> 
> 3. The FAST decoder receives a FAST message of a length of 60 bytes, and
>    as it decodes each field, constructs an object for each of these
>    fields. IOW, the same 14 fields are eventually constructed, with 12
>    native fields, and 2 strings. This process takes almost twice as much
>    CPU as the #2 above.
> 
> Here is a deserialized template snippet:
> 
> Group name=MDIncRefresh|ID=0(ROOT)
> 
> |fld id=35|fldtype=Primitive|type=String|op=Constant|Value:X |fld
> id=8|fld type=Primitive|type=String|op=Constant|Value: FIX.5.0.SP1 |fld
> id=34|fld type=Primitive|type=U32|op=None| |fld id=52|fld
> type=Primitive|type=U32|op=None|
> 
> Sequence name=MDEntries|ID=1|
> 
> |fld id=268|fld type=Sequence|type=U32|op=None| |fld id=279|fld
> type=Primitive|type=String|op=Copy|flags=Optional |fld id=269|fld
> type=Primitive|type=String|op=Copy|flags=Optional |fld id=278|fld
> type=Primitive|type=String|op=Copy|flags=Optional |fld id=270|fld
> type=Primitive|type=Decimal|op=Copy|flags=Optional |fld id=271|fld
> type=Primitive|type=U32|op=Copy|flags=Optional
> 
> Group name=Level2|ID=2|
> 
> |fld id=346|fld type=Primitive|type=U32|op=Copy|flags=Optional |fld
> id=290|fld type=Primitive|type=U32|op=Copy|flags=Optional
> 
> End of Group Name=Level2|ID=2
> 
> Group name=Instrument|ID=3
> 
> |fld id=22|fld type=Primitive|type=String|op=Constant|flags=Group|Value:
> 122 |fld id=48|fld type=Primitive|type=String|op=Copy|flags=Group
> 
> End of Template name=MDIncRefresh|ID=30
> 
> 
> > > Hi, everyone,
> > >
> > > what are the expectations for the FAST encoder and decoder with
> > > respect to CPU utilization? My implementation encodes approximately
> > > 90K+ messages per second and decodes 120K+ messages per second, with
> > > compression ratio of 3x (for Level2 data).
> > >
> > > However, the bad news is that Fast decoder uses more CPU than its
> > > non- FAST peer, and the gap widens at higher rates. At 50K messages
> > > per second, FAST utiliizes over 2x of CPU (20% compared to 10%).
> > >
> > > Is this really expected? I was hoping that since less data travels
> > > over the network, the CPU utilization would be comparable (as
> > > latency is).
> > >
> > > Thanks, Dimitry
> >
> > Dimitry, the difference in message layout, e.g., the use of fewer and
> > larger MarketDataIncrementalRefresh with multiple internal entries in
> > contrast with a feed using more but smaller messages, makes it hard to
> > make comparison between feeds on a messages per second level. One
> > option is to measure cpu use vs fields per second or FAST encoded bits
> > per second, both as a way of comparing implementations. Using bits per
> > second also makes it possible to compare processing complexity for
> > different feeds. I think that we have measured a factor of about five
> > in messages per second decoded depending on what data set is used.
> > Also, we have seen a factor of five to ten in encoder/decoder
> > performance comparing the original FAST 1.0 sample code to more
> > heavily optimized implementations.
> >
> > Cheers, Anders


[You can unsubscribe from this discussion group by sending a message to 
mailto:[EMAIL PROTECTED]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Financial Information eXchange" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/FIX-Protocol?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to