Also, here are 2 examples of pure pattern matching algorithms which work extremely well (both happen to be written by myself):
-dClass [0]. This project successfully uses pure pattern matching to do device classification. This project also is successfully able to do OS detection and browser detection using a separate OS and browser index. -stats.zone [1]. This is a side project I wrote which parses human language in the baseball domain. It uses pure pattern matching to accomplish language classification (NLP). This should demonstrate the power of pattern matching on a much more complicated domain, the English language. I strongly feel that we need to embrace a full on parallel pattern matching algorithm as the path forward in this project. This is why I would like to distance this project from the legacy serial user agent parsing algorithm. Parsing user agents has failed time and time again due to complexities and an ever changing device landscape. Its also slow, complicated, and very error prone. Pattern matching is extremely simple, extremely fast, and purely data driven. I dont expect the core algorithm to change much over the course of major releases, only the data powering the algorithm. This project is in a very treacherous landscape since device classification is not widely accepted. Therefor we need to be as state of the art as possible. We also need to be flexible and fast. [0] https://github.com/TheWeatherChannel/dClass [1] http://stats.zone/ --- From: Reza Naghibi <[email protected]> To: Devicemap-dev <[email protected]> Sent: Friday, January 2, 2015 12:58 PM Subject: Deleting the legacy ODDR client and related artifacts from SVN Any objections to deleting the legacy ODDR java client and its related artifacts from SVN? This is purely a code cleanup. Here are my thoughts on this matter: -The legacy client was rewritten a year ago and it offers a huge set of improvements. Its simpler, several orders of magnitude faster, more predictable, and it moves all of the device logic from code to data. Basically, its modern. One of the biggest changes is that the legacy ODDR client loops thru every pattern looking for a match, one by one, using a complicated set of heuristics specific to each class of user-agents. This does not scale. The new client is able to check all patterns in parallel using pure pattern matching. This scales extremely well. -The DDR data can no longer evolve to support the legacy client. While the 1.0.x releases may work, once 2.0 is released, the legacy client will in no way shape or form still work. -The legacy client is distraction. Its taking focus away from moving our current objectives forward. This project, like all projects, must evolve. This means rewriting clients, reformatting data, and basically throwing old things away. This is a natural process in any software development project. The same considerations must be given to old artifacts in this project. This project must evolve. If there are no objections, I will be removing the legacy artifacts from SVN in 5 days (120 hours).
