Re: 2x Performance Increase in classify()

Werner Keil Wed, 10 Dec 2014 10:25:31 -0800

It does not parse the user agent it only uses more sophisticated (and see
Android, etc. tailor made) regex patterns than the current large XML parser
does;-)






On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi <
[email protected]> wrote:

> If you are saying that the OpenDDR client parses the user agent string,
> then that is something we need to avoid at all costs. I honestly was not
> aware that OpenDDR did parsing like that. Parsing the user agent has a
> whole lot of problems associated with it. The best approach, and the
> approach the current client uses, is to use pattern matching on device,
> browser, and OS signatures and use that to target specific devices,
> browsers, operating systems, and their versions.
>
>       From: Werner Keil <[email protected]>
>  To: [email protected]; Reza Naghibi <[email protected]>
>  Sent: Wednesday, December 10, 2014 12:41 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> [email protected]> wrote:
>
> > >> currently provide better recognition of say an update to Android 4 or
> 5
> >
> > Hmm... can you explain this in more detail?
> >
> > From my work on the legacy client, it does not do anything more than
> > matching builder strings against user agents. The legacy client had a
> more
> > brute force algorithm which would have to pick a particular builder to
> use,
> > which was error prone. The new classifier client attempts to match all
> > builders at once and then chooses the highest ranking match, thus
> > increasing the accuracy. So I am not aware of any reason that one client
> > can recognize a pattern better than the other, especially if they are
> > working off the of the same data. Only the opposite is possible, missing
> a
> > pattern match.
> >
> >      From: Werner Keil <[email protected]>
> >  To: [email protected]; Reza Naghibi <[email protected]>
> >  Sent: Wednesday, December 10, 2014 12:13 PM
> >  Subject: Re: 2x Performance Increase in classify()
> >
> > Volkan/Reza,
> >
> > Let's keep in mind, the W3C DDR implementation has specialized
> recognition
> > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> > subclasses that analyze the UserAgent more thoroughly, and currently
> > provide better recognition of say an update to Android 4 or 5.
> >
> > Werner
> >
> >
> >
> >
> > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> > [email protected]> wrote:
> >
> > > Volkan,
> > >
> > > Thanks for the performance patch. I reviewed it and it looks pretty
> good.
> > > Pre patch, we were running each ngram set thru some raw string
> processing
> > > normalizations. You patch does a good job moving that to the beginning
> > and
> > > optimizing the regex. Good job :)
> > >
> > > As for pattern matching, if you look at the normalization method, we
> only
> > > look at alpha-numerics. This was done for simplicity sake. The downside
> > > here is that we weaken any pattern which contains non alpha numerics.
> > There
> > > are several ways to address and fix this, but since DeviceMap has
> control
> > > over its own data, I prefer fixing the patterns and keeping the
> matching
> > > engine simple. The thing to remember is that our data came from OpenDDR
> > > which had a more complex classification algorithm and heuristics, so we
> > > kind of have a bit of legacy baggage to sort thru as this project
> > evolves.
> > >
> > > Regarding our next release, I already have the Java client 1.1.0 ready
> to
> > > go. I would like to get your patch in on the next release, 1.1.1.
> > >
> > > Reza
> > >
> > >
> > >      From: Volkan YAZICI <[email protected]>
> > >  To: "[email protected]" <
> > > [email protected]>
> > >  Sent: Wednesday, December 10, 2014 9:32 AM
> > >  Subject: 2x Performance Increase in classify()
> > >
> > > Good news everyone!
> > >
> > > Here is the patch that introduces JMH-based benchmarks for Java client:
> > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> > >
> > > And here is the patch that introduces >2x performance gain: DMAP-107
> > > <https://issues.apache.org/jira/browse/DMAP-107>
> > >
> > > *Sample output:*
> > >
> > > $ export userAgentFile=/path/to/user-agents.txt
> > > $ wc -l $userAgentFile
> > > 195325
> > > $ java \
> > >    -jar
> > >
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > > \
> > >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> > >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> > >    ".*DeviceMapClientBenchmark.*"
> > >
> > > # Using the most recent trunk.
> > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> > >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > > stdev = 1160.484
> > >  Confidence interval (99.9%): [10838.781, 13320.036]
> > >
> > > # Using the enhanced classify().
> > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> > >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > > 413.211
> > >  Confidence interval (99.9%): [5063.607, 5947.103]
> > >
> > >
> > > Cheers!
> > >
> > >
> > >
> >
> >
> >
>
>
>

Re: 2x Performance Increase in classify()

Reply via email to