Algorithms for Large Data Analytics via Coresets and Sketches

ALS 4001 ** Note the different time and place **
Fri, 03/11/2016 - 9:00am

Jeff Phillips
Assistant Professor, School of Computing, University of Utah

Abstract:
For the last decade, many companies and scientists are generating enormous
quantities of data, yet often do not have the facilities to properly collect,
annotate, or analyze this data.  An emerging approach towards this problem is
to create coresets and sketches of that data.  These are powerful summaries
which can be efficiently maintained, and for important aspects of the data
can be queried similar to the original data, but much more efficiently and
with bounded error.  Impressively, the sizes of the summaries depend only on
the error in the approximation guarantees.

In this talk I will discuss my work in developing algorithms for coresets and
sketches central to data analysis, as well as some of the broader
computational and analytical consequences of working with them.  I will focus
on two classes of summaries.  The first is a sketch for matrices, called
Frequent Directions (FD).  Matrix sketching is the most common preprocessing
technique for many enormous data sets used in machine learning and data
mining.  FD is efficient and general to construct, and it provides the
smallest size/approximation-error ratio for common error measures, an order
of magnitude better than those based on random projections or random
sampling.
The second is a coreset for kernel density estimates, the dominate way to
model noisy spatial data.  In this setting we also show significant
improvements over simple random sampling approaches.  Moreover, we describe
how this coreset can be used to preserve worst-case L_infty bounds necessary
for preserving anomalous events, important spatial patterns, and even
topological properties of the data.

Bio:


URL:
http://eecs.oregonstate.edu/colloquium/algorithms-large-data-analytics-coresets-and-sketches

_______________________________________________
Colloquium mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/colloquium

Reply via email to