Niels,

Thanks a lot for your reply and suggestions regarding Nnginx.

Probably have to run that through Radu, Bertrand or others if we're allowed
to share or harvest the logs from the DeviceMap VM, but happy to do so
unless there was a concern about using it.

The VM is under http://devicemap-vm.apache.org/ I removed the links from
the site because once DeviceMap is archived, I assume the VM will be turned
down and we may no longer log onto it to restart it either.
The most important purpose of downloading device data shifted to official
Apache mirrors, but as long as people still use unmodified release versions
of DeviceMap Java or .NET clients, those still access the VM.

I had to get access to that VM by infra, in theory any committer e.g.
Volkan should be able to access it, so he may be able to help you (please
just check with whoever still responds from the PMC to see, if there was a
problem with using the logs in their mind)

Kind Regards,
Werner


On Mon, 19 Dec 2016 16:53:21 GMT, Niels Basjes wrote:


Hi,

I'm Niels Basjes; the author of Yauaa, Apache Avro committer and a
colleague of Volkan.

Werner Keil <werner.k...@gmail.com> wrote:

> It's hardly worth it for DeviceMap VM, but it seems YAUAA also has some
> nice log analysis tool to use on NGINX or Apache web server logs to analyze
> how many users visited a particular URL or extract UA data from it:
> https://github.com/nielsbasjes/yauaa/tree/master/examples/logparser

What you found is an example on how to use the Yauaa plugin
forhttps://github.com/nielsbasjes/logparser in Apache Pig.

This logparser project is intended to make parsing Apache HTTPD
logfiles easier and as such has been used by many of my collegues for
quite a while.

It is pluggable (hence you can load the Yauaa wrapper into it) and has
a Loader for Apache Pig which has been included in piggybank as a
standard feature since Pig 0.16.

I have been working parsing the NGinx logfiles but that is currently
unfinished and inactive.

If someone can send me a few example logformats for NGinx and a
handful of loglines that match these formats that would certainly make
is a lot easier for me to implement this faster. (A single logformat
with "all" fields would be great).

> Sounds a bit similar to what Wikipedia once did.

> If applied in the right way it could have automated gathering device data
> in ways we were often told to do (e.g. at ApacheCon or other presentations
> I gave).

Yauaa simply applies the available rules to find the desired results.
These rules are 'static' per version and essentially only "parse" the
useragent. I deliberately left out lookup tables with device numbers
and things like that because that is the maintenance nightmare I
wanted to avoid.

Last week I briefly discussed this with a colleague and we came to the
conclusion that the "best" way is probably to have several 'large,
globally distributed' websites to run a javascript that gathers
information (once per visit would be enough) and posts that to a
(separate) logging system together with the useragent. I say globally
distributed because I noticed that mobile devices vary in various
countries. This data would then need analysis and (automatic)
conversion into a device map database as a never ending effort.

> Well too late for DeviceMap, but certainly good to know and try in a
> different place...

Yes.

-- 

Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to