Parsing performance...

Marcin Kasperski Thu, 30 Jan 2014 13:35:07 -0800

I was optimizing some log-parsing script today, and found 
DateTime::Format::Strptime to be bottleneck. Out of curiosity I wrote
simple benchmark. Code is here:
    http://pastebin.com/PU8nXGPW


Results are interesting, as they show fairly noticeable differences:
    http://pastebin.com/S4rt6bYd
(run on Ubuntu 12.04 with perl 5.14.2 and ubuntu-packaged datetime
modules)

While the fact that „trying many approaches” parses like Natural or
Flexible are slow is, well, natural (still DateParse is much better),
I find it confusing that Strptime is soo slow. ISO8601 parser also
has fairly strict syntax to handle so I expected it to perform better...

Any insights? And which parsing method would you recommend to get
optimal performance?

~~~~ Sidenote ~~~~~

DateTime::Format::Builder does not give any easy way to treat
„below-second” part with float semantic (treat .12 as 120 miliseconds,
treat .1347 as 134700 microseconds, etc). In spite of the fact, that
this is most natural and ... only sensible semantics.

As you can see from my code and it's results, DateTime::Format::Flexible
falls into this trap (treats those digits as nanoseconds even if there
are less than 9 of them), while my hand-made builders require
postprocessing to clean this field up.

Am I missing sth? Is there a way to handle this better?

Parsing performance...

Reply via email to