It does not parse the user agent it only uses more sophisticated (and see Android, etc. tailor made) regex patterns than the current large XML parser does;-)
On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi < [email protected]> wrote: > If you are saying that the OpenDDR client parses the user agent string, > then that is something we need to avoid at all costs. I honestly was not > aware that OpenDDR did parsing like that. Parsing the user agent has a > whole lot of problems associated with it. The best approach, and the > approach the current client uses, is to use pattern matching on device, > browser, and OS signatures and use that to target specific devices, > browsers, operating systems, and their versions. > > From: Werner Keil <[email protected]> > To: [email protected]; Reza Naghibi <[email protected]> > Sent: Wednesday, December 10, 2014 12:41 PM > Subject: Re: 2x Performance Increase in classify() > > Well, it's not "legacy" it's simply the W3C compliant version, while the > new one deviates from that. > > It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to > Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to > differ from the XML file, so the classifier tries "something" but not > exactly the right thing) > nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML > data file, "4.1" instead of the correct 5 also matching the UA. > > As the W3C client isn't on the VM it is not so easy to test it against > actual tablets, but providing an actual UA like those from these tablets by > hand should work. > > For Nexus especially there seems to be a bug in the data files. Someone > invented "genericGoogle" which is a lose end, neither the W3C client nor > the new parser would find something as the parent doesn't seem to be in any > of the files ;-O > > > > > On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi < > [email protected]> wrote: > > > >> currently provide better recognition of say an update to Android 4 or > 5 > > > > Hmm... can you explain this in more detail? > > > > From my work on the legacy client, it does not do anything more than > > matching builder strings against user agents. The legacy client had a > more > > brute force algorithm which would have to pick a particular builder to > use, > > which was error prone. The new classifier client attempts to match all > > builders at once and then chooses the highest ranking match, thus > > increasing the accuracy. So I am not aware of any reason that one client > > can recognize a pattern better than the other, especially if they are > > working off the of the same data. Only the opposite is possible, missing > a > > pattern match. > > > > From: Werner Keil <[email protected]> > > To: [email protected]; Reza Naghibi <[email protected]> > > Sent: Wednesday, December 10, 2014 12:13 PM > > Subject: Re: 2x Performance Increase in classify() > > > > Volkan/Reza, > > > > Let's keep in mind, the W3C DDR implementation has specialized > recognition > > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and > > subclasses that analyze the UserAgent more thoroughly, and currently > > provide better recognition of say an update to Android 4 or 5. > > > > Werner > > > > > > > > > > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi < > > [email protected]> wrote: > > > > > Volkan, > > > > > > Thanks for the performance patch. I reviewed it and it looks pretty > good. > > > Pre patch, we were running each ngram set thru some raw string > processing > > > normalizations. You patch does a good job moving that to the beginning > > and > > > optimizing the regex. Good job :) > > > > > > As for pattern matching, if you look at the normalization method, we > only > > > look at alpha-numerics. This was done for simplicity sake. The downside > > > here is that we weaken any pattern which contains non alpha numerics. > > There > > > are several ways to address and fix this, but since DeviceMap has > control > > > over its own data, I prefer fixing the patterns and keeping the > matching > > > engine simple. The thing to remember is that our data came from OpenDDR > > > which had a more complex classification algorithm and heuristics, so we > > > kind of have a bit of legacy baggage to sort thru as this project > > evolves. > > > > > > Regarding our next release, I already have the Java client 1.1.0 ready > to > > > go. I would like to get your patch in on the next release, 1.1.1. > > > > > > Reza > > > > > > > > > From: Volkan YAZICI <[email protected]> > > > To: "[email protected]" < > > > [email protected]> > > > Sent: Wednesday, December 10, 2014 9:32 AM > > > Subject: 2x Performance Increase in classify() > > > > > > Good news everyone! > > > > > > Here is the patch that introduces JMH-based benchmarks for Java client: > > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106> > > > > > > And here is the patch that introduces >2x performance gain: DMAP-107 > > > <https://issues.apache.org/jira/browse/DMAP-107> > > > > > > *Sample output:* > > > > > > $ export userAgentFile=/path/to/user-agents.txt > > > $ wc -l $userAgentFile > > > 195325 > > > $ java \ > > > -jar > > > > devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar > > > \ > > > -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts > > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \ > > > -wi 5 -i 5 -bm avgt -tu ms -f 3 \ > > > ".*DeviceMapClientBenchmark.*" > > > > > > # Using the most recent trunk. > > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average] > > > Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000), > > > stdev = 1160.484 > > > Confidence interval (99.9%): [10838.781, 13320.036] > > > > > > # Using the enhanced classify(). > > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average] > > > Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev = > > > 413.211 > > > Confidence interval (99.9%): [5063.607, 5947.103] > > > > > > > > > Cheers! > > > > > > > > > > > > > > > > > >
