-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

OK, results...

So, I used the test data set :

https://svn.apache.org/repos/asf/incubator/devicemap/trunk/openddr/test-data/src/main/resources/test-data/dmap_20130522.txt

leaving aside the desktop issue and the other stuff like bots and
plain junk strings...

When everything was set up it *flew* thru the +47k ua-strings in 35
seconds !

The result data can be found here :

http://www.ducis.net/static/result_20130625.zip

it is a pipe-separated file with header :

Parser : time taken in ms
DMap : DeviceMapClient claimed device
UserAgent : useragent string
OpenDdr : 'Old' openddr claimed device

best thing to do is to import the lot in database...

then weed out records WHERE DMap = 'unknown' : there are devices which
no longer occur in the current XML resources *OR* where
DeviceMapClient.classify returned 'Nothing'

This leaves 17,919 records to compare.
Of these 5,042 (28%) match, i.e. : both DeviceMapClient and the 'old'
OpenDDR agree on the DeviceId

I picked out a few string and ran them in the simple console app to
double-check and the results were identical.

Time to bring back some regex I fear :-(

esjr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRya1NAAoJEOxywXcFLKYc1WAH/2t7eJE4r4kbH8gBYYVv9UWj
HvOzHARdv3K5iAVsKKsSgrFIP/0Rqp49INqieE79bLwrwfE8TCVgieh4LhIFa7gl
ZtihVthNrD+dWcFW6iitUL9JIS57lfe5sXow4PxIhs+2nyHTT0kjABAbWSt4pQYV
lZwU5eGQLYHwGv1tZfm7ceonm49j8HV7zXrz54IQ0R77FZXUQKMfoLYv/w7fB76R
5E/BN41Ei9XI1XkfPowlJ7L99k320T4C2z+eOIn80yDsrnhegW1+kOxljXbL7jFf
YefSkayF/Ss6/IkzMNBNJxXt33S+l4FPAit8zocjn0bKl6IPSXdAfOud9Sb7K0U=
=awDC
-----END PGP SIGNATURE-----

Reply via email to