Hi all,

I'm afraid I won't be able to give the whole HTTP logs I wanted to
give if we keep the original format.
Rezal's proposal seems better to me.

Kind regards,

Bruno Verachten

> The only concern I have is that some of the information requested will have 
> privacy impact. Is it possible to draft some sort of confidentiality 
> agreement between the sharing parties, this project, and Apache to make sure 
> this information is kept private and only used for the project?
>
> Also, would it make sense to remove the last 8 or 16 bits from the IP 
> address? I think it would also be a good idea to round the timestamp to the 
> nearest day.
>
> Thanks,
> Reza
>
> -----Original Message-----
> From: Stefano Andreani [mailto:[email protected]]
> Sent: Tuesday, October 09, 2012 3:34 PM
> To: [email protected]
> Subject: DDR update procedure
>
> Hi all,
>
> In order to setup a process to build and maintain the DDR, the first 
> requirement is to identify a way to allow the uploading of http logs from 
> contributors, in order to analyze the user agents. We should not receive just 
> user agent lists, but full http logs, for these reasons:
> - we need the timestamp for each user agent, in order to identify the 
> frequency of each user agent (we will not be able to process each user agent, 
> but hopefully enough to cover 99% of the http requests;
> - we need the source IP address, in order to have a geographical MAP of 
> distribution of the Devices, for two reasons: (1) analyzing IP addresses 
> (using a geographical DB, like GeoIP) we would have a map of uncovered 
> regions, so we would be able to improve the global coverage chasing 
> contributions from specific regions. (2) the same device can have different 
> user agents depending on the region where is has been commercialized and 
> using IPs we can improve analysis.
>
> At the same time, uploading such information without a clear policy about how 
> that data is handled could imply privacy issues, so we must keep the upload 
> area private and guarantee that the information is used consistently with the 
> objectives of DeviceMAP project, and not for other purposes.
>
> Do you agree on this?
>
> Anyone from the Apache infrastructure team can help to identify what is the 
> technical solution to satisfy these requirements?
>
> Cheers,
> Stefano.
>



-- 
Bruno Verachten

Reply via email to