Hi, I'm Niels Basjes; the author of Yauaa, Apache Avro committer and a colleague of Volkan.
Werner Keil <werner.k...@gmail.com> wrote: > It's hardly worth it for DeviceMap VM, but it seems YAUAA also has some > nice log analysis tool to use on NGINX or Apache web server logs to analyze > how many users visited a particular URL or extract UA data from it: > https://github.com/nielsbasjes/yauaa/tree/master/examples/logparser What you found is an example on how to use the Yauaa plugin for https://github.com/nielsbasjes/logparser in Apache Pig. This logparser project is intended to make parsing Apache HTTPD logfiles easier and as such has been used by many of my collegues for quite a while. It is pluggable (hence you can load the Yauaa wrapper into it) and has a Loader for Apache Pig which has been included in piggybank as a standard feature since Pig 0.16. I have been working parsing the NGinx logfiles but that is currently unfinished and inactive. If someone can send me a few example logformats for NGinx and a handful of loglines that match these formats that would certainly make is a lot easier for me to implement this faster. (A single logformat with "all" fields would be great). > Sounds a bit similar to what Wikipedia once did. > If applied in the right way it could have automated gathering device data > in ways we were often told to do (e.g. at ApacheCon or other presentations > I gave). Yauaa simply applies the available rules to find the desired results. These rules are 'static' per version and essentially only "parse" the useragent. I deliberately left out lookup tables with device numbers and things like that because that is the maintenance nightmare I wanted to avoid. Last week I briefly discussed this with a colleague and we came to the conclusion that the "best" way is probably to have several 'large, globally distributed' websites to run a javascript that gathers information (once per visit would be enough) and posts that to a (separate) logging system together with the useragent. I say globally distributed because I noticed that mobile devices vary in various countries. This data would then need analysis and (automatic) conversion into a device map database as a never ending effort. > Well too late for DeviceMap, but certainly good to know and try in a > different place... Yes. -- Best regards / Met vriendelijke groeten, Niels Basjes