Happening tomorrow!

---------- Forwarded message ---------
From: Galen Weld <gw...@cs.washington.edu>
Date: Thu, May 2, 2019 at 10:58 AM
Subject: Change Seminar, Tuesday, May 7: Fahad Pervaiz, Understanding
Challenges in the Data Pipeline for Development Data
To: <change@change.washington.edu>

Please join us for the Change Seminar next Tuesday, May 7, at 12pm in CSE

The developing world is relying more and more on data driven policies.
Numerous development agencies have pushed for on-ground data collection to
support the development work they pursue. Many governments have launched
efforts for more frequent information gathering. Overall, the amount of
data collected is tremendous, yet we face significant issues in doing
useful analysis. Most of these barriers are around data cleaning and
merging, and they require a data engineer to support some parts of the
analysis. This thesis aims to understand the pain points of cleaning
development data. It also proposes solutions that harness the thought
process of a data engineer to reduce the manual workload of the tedious
process of cleaning such data. To achieve these goals, two research areas
are critical: (1) to discern current data usage patterns and to build a
taxonomy of data cleaning in the developing world; and (2) to create
algorithms to support automated data cleaning, which target selected
problems including matching transliterated names. With these goals, this
thesis will empower regular data users to easily do the necessary data
cleaning and scrubbing for analysis.

Fahad Pervaiz <https://homes.cs.washington.edu/~fahadp/> is a graduating
computer science PhD student at Paul G. Allen School, University of
Washington, advised by Richard Anderson
<https://www.cs.washington.edu/people/faculty/anderson> in the ICTD Lab
<http://ictd.cs.washington.edu/>. Within the broad spectrum of developing
technology for low income countries, his interest includes data processing,
data infrastructure, HCI and public health. His research focuses on
understanding the challenges in processing data collected in low income
countries and explore solutions that are tailored to mitigate various
issues in the data pipeline.
change mailing list

Reply via email to