On 3.1.2014 04:09, Sameer Verma wrote: > Happy new year! May 2014 bring good deeds and cheer :-) > > Here's a blog post on the different approaches (that I know of) to data > gathering across different projects. Do let me know if I missed anything. > > cheers, > Sameer > > http://www.olpcsf.org/node/204
Thanks for putting together the summary, Sameer. Here is more information about my xo-stats project: The project's objective is to determine how XOs are used in Nepalese classrooms, but I am intending for the implementation to be general enough, so that it can be reused by other deployments as well. Similarly to other projects you've mentioned, I separated the project into four stages: 1) collecting data from the XO Journal backups on the schoolserver 2) extracting the data from the backups and storing it in an appropriate format for analysis and visualization 3) statistically analyzing and visualizing the captured data 4) formulating recommendations for improving the program based on the analysis. Stage 1 is already implemented on both the server side as well as the client side, so I first focused on the next step of extracting the data. Initially, I wanted to reuse an existing script, but I eventually found that none of them were general enough to meet my criteria. One of my goals is to make the script work on any version of Sugar. Thus, I have been working on process_journal_stats.py, which takes a '/users' directory with XO Journal backups as input, pulls out the Journal metadata and outputs them in a CSV or JSON file as output. Journal backups can be in a variety of formats depending on the version of Sugar. The script currently supports backup format present in Sugar versions 0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar 0.82. I am planning to add support for later versions of Sugar in the next version of the script. The script currently supports two ways to output statistical data. To produce all statistical data from the Journal, one row per Journal record: process_journal_stats.py all To extract statistical data about the use of activities on the system, use: process_journal_stats.py activity The full documentation with all the options are described in README at: https://github.com/martasd/xo-stats One challenge of the project has been determining how much data processing to do in the python script and what to leave for the data analysis and visualization tools later in the workflow. For now, I stopped adding features to the script and I am evaluating the most appropriate tools to use for visualizing the data. Here are some of the questions I am intending to answer with the visualizations and analysis: * How many times do installed activities get used? How does the activity use differ over time? * Which activities are children using to create files? What kind of files are being created? * Which activities are being launched in share-mode and how often? * Which part of the day do children play with the activities? * How does the set of activities used evolve as children age? I am also going to be looking how answers to these questions vary from class to class, school to school, and region to region. As Martin Abente and Sameer mentioned above, our work needs to be informed by discussions with the stakeholders- children, educators, parents, school administrators etc. We do have educational experts among the staff at OLE, who have worked with more than 50 schools altogether, and I will be talking to them as I look beyond answering the obvious questions. For visualization, I have explored using LibreOffice and SOFA, but neither of those were flexible to allow for customization of the output beyond some a few rudimentary options, so I started looking at various Javascript libraries, which are much more powerful. Currently, I am experimenting with Google Charts, which I found the easiest to get started with. If I run into limitations with Google Charts in the future, others on my list are InfoVIS Toolkit (http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then, there is also D3.js, but that's a bigger animal. Alternatively or perhaps in parallel, I am also willing to join efforts to improve the OLPC Dashboard, which is trying to answer very similar questions to mine. I am looking forward to collaborating with everyone who is interested in exploring ways to analyze and visualize OLPC/Sugar data in a interesting and meaningful way. Cheers, Martin _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel