Eberhard, can you, and possibly others, explain your approach used in matching 
and classifying user agents strings? This will probably help guide our approach.

So when I wrote dClass, a c pattern matching implementation, it actually worked 
with both wurfl and OpenDDR. With wurfl, the user agents have to go thru a 
parser which used several techniques to parse down the user agents into basic 
sets of identifiable tokens. This had problems since a lot of devices had 
random user agents containing tokens which would throw the algorithm off. This 
required a lot of human tuning.

OpenDDR already has the tokens parsed and its done very cleanly and accurately. 
This removes a large chunk of complexity from the process.

dClass indexes tokens into a dtree (decision tree) and then walks the input 
string while walking the dtree looking for matching tokens. Tokens can have a 
variety of different attributes which tell the algorithm how to treat the 
match. Given the structure of the dtree, performance will always be O(m), where 
m is the length of the token being matched. Performance is not dependent on n, 
the numbers of patterns in the dtree. So the dtree has a performance profile 
unlike most trees. Also, the dtree achieves 2 types of natural data 
compression. First, all common prefixes are reused. Second, I implemented 
system pointer compression on top of my memory allocation algorithm. These 
factors give it runtime performance in the sub 1us range and good memory 
efficiency. Finally, I attached a set of key value pairs to each matchable 
token. This gives the system the characteristics of a document oriented 
database (albeit a very advanced one).

I did a write up which talks more about the justification here:

http://www.rezsoft.org/device_detection/

And here:

http://mail-archives.apache.org/mod_mbox/incubator-devicemap-dev/201208.mbox/%3C3B961B5DBE03B04EAB618084BD661E4F51AEF42F%40PRTMB02.corp.weather.com%3E

dClass is pretty much in a steady state right now. When combined with OpenDDR, 
its highly accurate and extremely fast. I would like to see it become a part of 
DeviceMap and maybe incorporate features from other classification algorithms. 
I would also like to extend dClass into a more power decision classifier 
(dClass+) and with some new features use it to power and tackle larger 
classification problems.

Thanks,
Reza Naghibi
[email protected]


---
Sent from Blackberry Bold 9900

----- Original Message -----
From: eberhard speer jr. [mailto:[email protected]]
Sent: Thursday, December 20, 2012 05:13 AM
To: [email protected] <[email protected]>
Subject: User-Agent strings

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

a while ago I saw a request for user-agent strings.
These was some debate about IP addresses and privacy but I don't know
where things went from there.

I understand the idea is to see how 'complete' the OpenDDR resource
data is, ie : where are the gaps.

I can contribute a list of 65,757 user-agents in with the following
data/columns :

UserAgent : device user-agent string
Device : OpenDDR device Id, "unknown" for unresolved
Elapsed : time taken in ms to resolve UserAgent

I just ran the complete dataset through my test setup using the
OpenDDR 1.13 resources and OpenDDR resolver code.

This data can shed some light on gaps as well as strengths and weakness.

I also have a subset of this dataset -- 12,272 user-agent strings --
with :

Manufacture
OEM Model name
UserAgent
screen-width
screen-height

So, basically this is a list of 12,272 *unique* device models with an
User-agent sample string !

If you like I can make the data available for download on one of my
servers.

Also, I will gladly run any set of user-agent strings made available
to me through the test set-up.

Regards,

esjr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ0uTFAAoJEOxywXcFLKYcCPcH/Al1xVyqIa2y1B4siOmSEIMh
6pfndVUizAKCWkVjd4j5Vn3qLLLzubxi0Js+f/IuFaOWtjS5eLK1mkXr0/nUg3b6
Qk3qbzTsTV2Gx7ZeubhuCjhnKD8orI0rmuPIpyrTccBGdsMl35BFGQWxYAuOjamI
I1tM538+H5PSFiUvfBmKxohGIG0j/GBUhIjIVhezJlC0e9ceowoM5S8GqdKlxdal
8QITiCn8F8PPXt2BzdKygZwNE6dRYdZF8vm89w50ECsdtYcuRWlo90FudAfPpZbS
0VM/M8uTOsen3LYEJGSXeUUa66cawNKx+kVwIzZ/Q2FlN/kkPnYsylxq1HV+NG4=
=9yO0
-----END PGP SIGNATURE-----


Reply via email to