Hi,

I'm Niels Basjes; the author of Yauaa, Apache Avro committer and a
colleague of Volkan.

Werner Keil <werner.k...@gmail.com> wrote:

> It's hardly worth it for DeviceMap VM, but it seems YAUAA also has some
> nice log analysis tool to use on NGINX or Apache web server logs to analyze
> how many users visited a particular URL or extract UA data from it:
> https://github.com/nielsbasjes/yauaa/tree/master/examples/logparser

What you found is an example on how to use the Yauaa plugin for
https://github.com/nielsbasjes/logparser in Apache Pig.

This logparser project is intended to make parsing Apache HTTPD
logfiles easier and as such has been used by many of my collegues for
quite a while.

It is pluggable (hence you can load the Yauaa wrapper into it) and has
a Loader for Apache Pig which has been included in piggybank as a
standard feature since Pig 0.16.

I have been working parsing the NGinx logfiles but that is currently
unfinished and inactive.

If someone can send me a few example logformats for NGinx and a
handful of loglines that match these formats that would certainly make
is a lot easier for me to implement this faster. (A single logformat
with "all" fields would be great).

> Sounds a bit similar to what Wikipedia once did.

> If applied in the right way it could have automated gathering device data
> in ways we were often told to do (e.g. at ApacheCon or other presentations
> I gave).

Yauaa simply applies the available rules to find the desired results.
These rules are 'static' per version and essentially only "parse" the
useragent. I deliberately left out lookup tables with device numbers
and things like that because that is the maintenance nightmare I
wanted to avoid.

Last week I briefly discussed this with a colleague and we came to the
conclusion that the "best" way is probably to have several 'large,
globally distributed' websites to run a javascript that gathers
information (once per visit would be enough) and posts that to a
(separate) logging system together with the useragent. I say globally
distributed because I noticed that mobile devices vary in various
countries. This data would then need analysis and (automatic)
conversion into a device map database as a never ending effort.

> Well too late for DeviceMap, but certainly good to know and try in a
> different place...

Yes.

-- 

Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to