So I noticed a few things. First, I think there may be something wrong with your client. For example, when I run
'Browser Mozilla/4.0 (compatible; MSIE 7.0; Windows Phone OS 7.0; Trident/3.1; IEMobile/7.0; SAMSUNG; SGH-i917)' Thru the java client, I get: 2013-06-25 14:56:42,120 [dmapjclient] classify: Browser Mozilla/4.0 (compatible; MSIE 7.0; Windows Phone OS 7.0; Trident/3.1; IEMobile/7.0; SAMSUNG; SGH-i917)' 2013-06-25 14:56:42,121 [dmapjclient] Hit candidate: samsungsgh => genericPhone 2013-06-25 14:56:42,130 [dmapjclient] Hit candidate: mozilla40compatible => desktopDevice 2013-06-25 14:56:42,131 [dmapjclient] Hit candidate: i917 => SGH-i917 Classify result: 'SGH-i917' In your results, you get 'Sprint M370'. Also, what DDR data are you using? Im using OpenDDR 1.18. So for these 2 user agents: lg-t300 UNTRUSTED/1.0 Nokia7370 There are no patterns for them. Your .NET client found patterns, so im guessing we are using different DDR data. So one of the main issues I found with my client implementations is that it only supports 1 pattern per device. A lot of devices have multiple patterns and only 1 pattern is being considered. So I plan on fixing this by allowing a device to have multiple patterns. Not sure how we are going to keep our changes in sync since I think we have some divergence in our algorithms. I may just run your test set thru the java client and manually compare. When we get these 3 issues straightened out, we should see a lot more parity between the algorithms. This is even with the new algorithm ignoring spaces/symbols/regex, which I plan on address shortly too. ________________________________ From: eberhard speer jr. <[email protected]> To: [email protected] Sent: Tuesday, June 25, 2013 10:46 AM Subject: DeviceMapClient - results -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 OK, results... So, I used the test data set : https://svn.apache.org/repos/asf/incubator/devicemap/trunk/openddr/test-data/src/main/resources/test-data/dmap_20130522.txt leaving aside the desktop issue and the other stuff like bots and plain junk strings... When everything was set up it *flew* thru the +47k ua-strings in 35 seconds ! The result data can be found here : http://www.ducis.net/static/result_20130625.zip it is a pipe-separated file with header : Parser : time taken in ms DMap : DeviceMapClient claimed device UserAgent : useragent string OpenDdr : 'Old' openddr claimed device best thing to do is to import the lot in database... then weed out records WHERE DMap = 'unknown' : there are devices which no longer occur in the current XML resources *OR* where DeviceMapClient.classify returned 'Nothing' This leaves 17,919 records to compare. Of these 5,042 (28%) match, i.e. : both DeviceMapClient and the 'old' OpenDDR agree on the DeviceId I picked out a few string and ran them in the simple console app to double-check and the results were identical. Time to bring back some regex I fear :-( esjr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRya1NAAoJEOxywXcFLKYc1WAH/2t7eJE4r4kbH8gBYYVv9UWj HvOzHARdv3K5iAVsKKsSgrFIP/0Rqp49INqieE79bLwrwfE8TCVgieh4LhIFa7gl ZtihVthNrD+dWcFW6iitUL9JIS57lfe5sXow4PxIhs+2nyHTT0kjABAbWSt4pQYV lZwU5eGQLYHwGv1tZfm7ceonm49j8HV7zXrz54IQ0R77FZXUQKMfoLYv/w7fB76R 5E/BN41Ei9XI1XkfPowlJ7L99k320T4C2z+eOIn80yDsrnhegW1+kOxljXbL7jFf YefSkayF/Ss6/IkzMNBNJxXt33S+l4FPAit8zocjn0bKl6IPSXdAfOud9Sb7K0U= =awDC -----END PGP SIGNATURE-----
