On 8/27/09 8:42 AM, "Doug Cutting" <[email protected]> wrote:

> Eelco Hillenius wrote:
>>> reading the records from a local MySQL database and instantiating the
>>> event objects in 4.5 minutes on my MBP. Reading in and instantiating
>>> those events from the log files again costs 1.3 minutes.
>> 
>> Last time I'll bug you guys with this, but after some optimization on
>> my part, I cut it back to 2.6 minutes write and 42 seconds read time.
> 
> Thanks for this data!
> 
> It would be interesting to see how much using generic or specific
> representations would change these times.
> 
> Doug
> 

It would definitely be nice to set up some tests to compare various usage
patterns of the API.  Comparing to things like ProtocolBuffers and Thrift is
useful, but perhaps more interesting is comparing to SequenceFile or other
core Hadoop formats.

Decoding 3MB/sec seems rather slow to me (121MB log file instantiated to
objects in ~40 secs).  For comparison, creating tuple objects from a Hadoop
SequenceFile is ~5x faster.  Granted I'm comparing apples to oranges (my
objects in SequenceFile to Eelco's test in Avro).

This would depend on a lot on the objects themselves, the schema, and
generic vs. specific, etc.

Reply via email to