On Jun 26, 2006, at 7:18 PM, Ashkan Soltani wrote:

Questions:

What would be the best way to 'collect' this data, given that users may or may not have network access, or could possibly be firewalled, etc. (*caveat: i'm looking for a quick low-hanging fruit such that more time can be spent on the analysis side of the project)

Possible idea's I've considered is using HTTP in real time (or via implementing a buffering system) to log the data to a central server. There's some direct support for this in the python logging module under HTTPHandler: http://docs.python.org/lib/module-logging.html

Alternatives would be to use a 'sync' procedure, either via rsync/ftp or perhaps even the background sync module to upload the logdata to our servers. The implementation for this stuff would be a bit more involved, specially since I'm not storing the logdata in the repository and would need to figure out how to encapsulate it, but it might be more compatible with the rest of the chandler methodology.

The last/simplest approach would be to use simple SMTP to post the information since we know that more often than not, users will have 'at least' SMTP access from within Chandler since it is a mail app after all.

Here are some questions and concerns I thought about while reading your post:

Your right to be concerned about outgoing ports and transport mechinisms - unless we have a server listening on one of the known ports (80, 8080, 443, etc) the vast majority of the users will not be able to use the service. This may also be a concern if you decide to use email - a lot of people are behind email servers that limit outbound email to X number per minute or X amount of traffic and that would also require some sort of configuration step so Chandler could authenticate to their SMTP server.

My biggest concern is given that we could be logging personal or other sensitive information, even indirectly, the transport stream would either have to be encrypted or the data encrypted and then sent as binary data.

We would also need to be concerned with the number of data packets being sent to this server so we can find out sooner than later if front-end load (the receiving of the data) or back-end load (the expanding of the data and putting it somewhere that devs can access) will be the bottleneck.

Your idea of using rsync doesn't seem practical to me as all of this data is being generated new on the client side so rsync would just end up sending it all anyway - may as well stick to HTTP Put or WebDav or something like that.

Using sftp or ftp IMO is also a non-starter as those protocols have plenty of issues from the security standpoint. Just watch how quickly our IP would be deluged with script-kiddies when they find out we have a FTP port that allows PUTs using computer generated UserIDs.

One method I would propose would be to use XMPP and a pub/sub setup.

---
Bear

Build and Release Engineer
Open Source Applications Foundation (OSAF)
[EMAIL PROTECTED]
http://www.osafoundation.org

[EMAIL PROTECTED]
http://code-bear.com

PGP Fingerprint = 9996 719F 973D B11B E111  D770 9331 E822 40B3 CD29


Attachment: PGP.sig
Description: This is a digitally signed message part

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to