http://www.datasciencecentral.com/profiles/blogs/ibm-distinguished-engineer-solves-big-data-conjecture
BM Distinguished Engineer solves Big Data Conjecture

   - Posted by Vincent
Granville<http://www.datasciencecentral.com/profile/VincentGranville>on
October 23, 2013 at 3:28pm
   - View 
Blog<http://www.datasciencecentral.com/profiles/blog/list?user=3v6n5b6g08kgn>

  A mathematical problem related to big data was solved by Jean-Francois
Puget, engineer in the Solutions Analytics and Optimization group at IBM
France. The problem was first mentioned on Data Science Central, and an
award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins
University Applied Physics Laboratory, made a significant breakthrough in
July, and won $500. Jean-Francois Puget completely solved the problem,
independently from Bryan, and won a $1,000 award.

*Example of rare, special permutation investigated to prove the theorem*

The competition has been organized and financed by Data Science central.
Participants from around the world submitted a number of interesting
approaches. The mathematical question was asked by Vincent Granville, a
leading data scientist and co-founder at Data Science Central. Granville
initially proposed a solution after performing large-scale Monte Carlo
simulations, but his solution turned out to be wrong.

The problem consisted in finding an exact formula for a new type of
correlation and goodness-of-fit metrics, designed specifically for big
data, generalizing the Spearman's rank coefficient, and being especially
robust for non-bounded, ordinal data found in large data sets. From a
mathematical point of view, the new metric is based on L-1 rather than L-2
theory: In other words, it relies on absolute rather than squared
differences. Using squares (or higher powers) is what makes traditional
metrics such as R squared notoriously sensitive to outliers, and avoided by
savvy statistical modelers. In big data, outliers are plentiful and it can
render conclusions from a statistical analysis invalid, so this is a
critical issue. This outlier issue is sometimes referred to as *the curse
of big 
data*.....[more<http://www.datasciencecentral.com/profiles/blogs/ibm-distinguished-engineer-solves-big-data-conjecture>
]


-tj
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Reply via email to