[4/4] impala git commit: IMPALA-5310: Part 2: Add SAMPLED_NDV() function.

kwho Tue, 12 Dec 2017 15:52:47 -0800

IMPALA-5310: Part 2: Add SAMPLED_NDV() function.

Adds a new SAMPLED_NDV() aggregate function that is
intended to be used in COMPUTE STATS TABLESAMPLE.
This patch only adds the function itself. Integration
with COMPUTE STATS will come in a separate patch.


SAMPLED_NDV() estimates the number of distinct values (NDV)
based on a sample of data and the corresponding sampling rate.
The main idea is to collect several x/y data points where x is
the number of rows and y is the corresponding NDV estimate.
These data points are used to fit an objective function to the
data such that the true NDV can be extrapolated.
The aggregate function maintains a fixed number of HyperLogLog
intermediates to compute the x/y points.
Several objective functions are fit and the best-fit one is
used for extrapolation.

Adds the MPFIT C library to perform curve fitting:
https://www.physics.wisc.edu/~craigm/idl/cmpfit.html

The library is a C port from Fortran. Scipy uses the
Fortran version of the library for curve fitting.

Testing:
- added functional tests
- core/hdfs run passed

Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820
Reviewed-on: http://gerrit.cloudera.org:8080/8569
Reviewed-by: Alex Behm <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/0936e329
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/0936e329
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/0936e329

Branch: refs/heads/master
Commit: 0936e3296697aa63f07c0919629d0967b88474e6
Parents: e57c77f
Author: Alex Behm <[email protected]>
Authored: Fri Oct 6 22:11:04 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Tue Dec 12 22:20:18 2017 +0000

----------------------------------------------------------------------
 LICENSE.txt                                     |   81 +
 NOTICE.txt                                      |    4 +
 be/src/exprs/aggregate-functions-ir.cc          |  209 +-
 be/src/exprs/aggregate-functions.h              |   17 +
 be/src/thirdparty/mpfit/DISCLAIMER              |   77 +
 be/src/thirdparty/mpfit/README                  |  705 ++++++
 be/src/thirdparty/mpfit/mpfit.c                 | 2305 ++++++++++++++++++
 be/src/thirdparty/mpfit/mpfit.h                 |  198 ++
 be/src/util/CMakeLists.txt                      |    3 +
 be/src/util/mpfit-util.cc                       |   82 +
 be/src/util/mpfit-util.h                        |   98 +
 bin/rat_exclude_files.txt                       |    3 +-
 bin/run_clang_tidy.sh                           |    5 +-
 .../impala/analysis/FunctionCallExpr.java       |   27 +-
 .../org/apache/impala/catalog/BuiltinsDb.java   |   39 +-
 .../org/apache/impala/catalog/HdfsTable.java    |    1 -
 .../impala/analysis/AnalyzeStmtsTest.java       |   53 +
 .../apache/impala/common/FrontendTestBase.java  |   12 +-
 tests/query_test/test_aggregation.py            |   69 +
 19 files changed, 3976 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/LICENSE.txt
----------------------------------------------------------------------
diff --git a/LICENSE.txt b/LICENSE.txt
index 419a726..aa211ad 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -946,3 +946,84 @@ Some portions of this module are derived from code from 
FastHash
   OTHER DEALINGS IN THE SOFTWARE.
 
 
--------------------------------------------------------------------------------
+
+be/src/thirdparty/mpfit
+
+MPFIT: A MINPACK-1 Least Squares Fitting Library in C
+
+Original public domain version by B. Garbow, K. Hillstrom, J. More'
+  (Argonne National Laboratory, MINPACK project, March 1980)
+  Copyright (1999) University of Chicago
+    (see below)
+
+Tranlation to C Language by S. Moshier (moshier.net)
+  (no restrictions placed on distribution)
+
+Enhancements and packaging by C. Markwardt
+  (comparable to IDL fitting routine MPFIT
+   see http://cow.physics.wisc.edu/~craigm/idl/idl.html)
+  Copyright (C) 2003, 2004, 2006, 2007 Craig B. Markwardt
+
+  This software is provided as is without any warranty whatsoever.
+  Permission to use, copy, modify, and distribute modified or
+  unmodified copies is granted, provided this copyright and disclaimer
+  are included unchanged.
+
+
+Source code derived from MINPACK must have the following disclaimer
+text provided.
+
+===========================================================================
+Minpack Copyright Notice (1999) University of Chicago.  All rights reserved
+
+Redistribution and use in source and binary forms, with or
+without modification, are permitted provided that the
+following conditions are met:
+
+1. Redistributions of source code must retain the above
+copyright notice, this list of conditions and the following
+disclaimer.
+
+2. Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials
+provided with the distribution.
+
+3. The end-user documentation included with the
+redistribution, if any, must include the following
+acknowledgment:
+
+   "This product includes software developed by the
+   University of Chicago, as Operator of Argonne National
+   Laboratory.
+
+Alternately, this acknowledgment may appear in the software
+itself, if and wherever such third-party acknowledgments
+normally appear.
+
+4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
+WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
+UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
+THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
+OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
+OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
+USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
+THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
+DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
+UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
+BE CORRECTED.
+
+5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
+HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
+ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
+INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
+ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
+PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
+SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
+(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
+EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
+POSSIBILITY OF SUCH LOSS OR DAMAGES.
+
+--------------------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/NOTICE.txt
----------------------------------------------------------------------
diff --git a/NOTICE.txt b/NOTICE.txt
index 3760589..c2875a2 100644
--- a/NOTICE.txt
+++ b/NOTICE.txt
@@ -13,3 +13,7 @@ Project for use in the OpenSSL Toolkit 
(http://www.openssl.org/)
 This product includes cryptographic software written by Eric Young
 ([email protected]).  This product includes software written by Tim
 Hudson ([email protected]).
+
+This product includes software developed by the University of Chicago,
+as Operator of Argonne National Laboratory.
+Copyright (C) 1999 University of Chicago. All rights reserved.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/be/src/exprs/aggregate-functions-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/aggregate-functions-ir.cc 
b/be/src/exprs/aggregate-functions-ir.cc
index d5f946a..499188c 100644
--- a/be/src/exprs/aggregate-functions-ir.cc
+++ b/be/src/exprs/aggregate-functions-ir.cc
@@ -17,24 +17,25 @@
 
 #include "exprs/aggregate-functions.h"
 
-#include <math.h>
 #include <algorithm>
 #include <map>
 #include <sstream>
 #include <utility>
+#include <cmath>
 
 #include <boost/random/ranlux.hpp>
 #include <boost/random/uniform_int.hpp>
 
 #include "codegen/impala-ir.h"
 #include "common/logging.h"
+#include "exprs/anyval-util.h"
+#include "exprs/hll-bias.h"
 #include "runtime/decimal-value.inline.h"
 #include "runtime/runtime-state.h"
 #include "runtime/string-value.inline.h"
 #include "runtime/timestamp-value.h"
 #include "runtime/timestamp-value.inline.h"
-#include "exprs/anyval-util.h"
-#include "exprs/hll-bias.h"
+#include "util/mpfit-util.h"
 
 #include "common/names.h"
 
@@ -42,6 +43,7 @@ using boost::uniform_int;
 using boost::ranlux64_3;
 using std::make_pair;
 using std::map;
+using std::min_element;
 using std::nth_element;
 using std::pop_heap;
 using std::push_heap;
@@ -1480,6 +1482,186 @@ BigIntVal 
AggregateFunctions::HllFinalize(FunctionContext* ctx, const StringVal&
   return estimate;
 }
 
+/// Intermediate aggregation state for the SampledNdv() function.
+/// Stores NUM_HLL_BUCKETS of the form <row_count, hll_state>.
+/// The 'row_count' keeps track of how many input rows were aggregated into 
that
+/// bucket, and the 'hll_state' is an intermediate aggregation state of 
HyperLogLog.
+/// See the header comments on the SampledNdv() function for more details.
+class SampledNdvState {
+ public:
+  /// Empirically determined number of HLL buckets. Power of two for fast 
modulo.
+  static const uint32_t NUM_HLL_BUCKETS = 32;
+
+  /// A bucket contains an update count and an HLL intermediate state.
+  static constexpr int64_t BUCKET_SIZE = sizeof(int64_t) + 
AggregateFunctions::HLL_LEN;
+
+  /// Sampling percent which was given as the second argument to SampledNdv().
+  /// Stored here to avoid existing issues with passing constant arguments to 
all
+  /// aggregation phases and because we convert the sampling percent argument 
from
+  /// decimal to double. See IMPALA-6179.
+  double sample_perc;
+
+  /// Counts the number of Update() calls. Used for determining which bucket 
to update.
+  int64_t total_row_count;
+
+  /// Array of buckets.
+  struct {
+    int64_t row_count;
+    uint8_t hll[AggregateFunctions::HLL_LEN];
+  } buckets[NUM_HLL_BUCKETS];
+};
+
+void AggregateFunctions::SampledNdvInit(FunctionContext* ctx, StringVal* dst) {
+  // Uses a preallocated FIXED_UDA_INTERMEDIATE intermediate value.
+  DCHECK_EQ(dst->len, sizeof(SampledNdvState));
+  memset(dst->ptr, 0, sizeof(SampledNdvState));
+
+  DoubleVal* sample_perc = 
reinterpret_cast<DoubleVal*>(ctx->GetConstantArg(1));
+  if (sample_perc == nullptr) return;
+  // Guaranteed by the FE.
+  DCHECK(!sample_perc->is_null);
+  DCHECK_GE(sample_perc->val, 0.0);
+  DCHECK_LE(sample_perc->val, 1.0);
+  SampledNdvState* state = reinterpret_cast<SampledNdvState*>(dst->ptr);
+  state->sample_perc = sample_perc->val;
+}
+
+/// Incorporate the 'src' into one of the intermediate HLLs, which will be 
used by
+/// Finalize() to generate a set of the (x,y) data points.
+template <typename T>
+void AggregateFunctions::SampledNdvUpdate(FunctionContext* ctx, const T& src,
+    const DoubleVal& sample_perc, StringVal* dst) {
+  SampledNdvState* state = reinterpret_cast<SampledNdvState*>(dst->ptr);
+  int64_t bucket_idx = state->total_row_count % 
SampledNdvState::NUM_HLL_BUCKETS;
+  StringVal hll_dst = StringVal(state->buckets[bucket_idx].hll, HLL_LEN);
+  HllUpdate(ctx, src, &hll_dst);
+  ++state->buckets[bucket_idx].row_count;
+  ++state->total_row_count;
+}
+
+void AggregateFunctions::SampledNdvMerge(FunctionContext* ctx, const 
StringVal& src,
+    StringVal* dst) {
+  SampledNdvState* src_state = reinterpret_cast<SampledNdvState*>(src.ptr);
+  SampledNdvState* dst_state = reinterpret_cast<SampledNdvState*>(dst->ptr);
+  for (int i = 0; i < SampledNdvState::NUM_HLL_BUCKETS; ++i) {
+    StringVal src_hll = StringVal(src_state->buckets[i].hll, HLL_LEN);
+    StringVal dst_hll = StringVal(dst_state->buckets[i].hll, HLL_LEN);
+    HllMerge(ctx, src_hll, &dst_hll);
+    dst_state->buckets[i].row_count += src_state->buckets[i].row_count;
+  }
+  // Total count. Not really needed after Update() but kept for sanity 
checking.
+  dst_state->total_row_count += src_state->total_row_count;
+  // Propagate sampling percent to Finalize().
+  dst_state->sample_perc = src_state->sample_perc;
+}
+
+BigIntVal AggregateFunctions::SampledNdvFinalize(FunctionContext* ctx,
+    const StringVal& src) {
+  SampledNdvState* state = reinterpret_cast<SampledNdvState*>(src.ptr);
+
+  // Generate 'num_points' data points with x=row_count and y=ndv_estimate. 
These points
+  // are used to fit a function for the NDV growth and estimate the real NDV.
+  constexpr int num_points =
+      SampledNdvState::NUM_HLL_BUCKETS * SampledNdvState::NUM_HLL_BUCKETS;
+  int64_t counts[num_points] = { 0 };
+  int64_t ndvs[num_points] = { 0 };
+
+  int64_t min_ndv = numeric_limits<int64_t>::max();
+  int64_t min_count = numeric_limits<int64_t>::max();
+  // We have a fixed number of HLL intermediates to generate data points. Any 
unique
+  // subset of intermediates can be combined to create a new data point. It was
+  // empirically determined that 'num_data' points is typically sufficient and 
there are
+  // diminishing returns from generating additional data points.
+  // The generation method below was chosen for its simplicity. It 
successively merges
+  // buckets in a rolling window of size NUM_HLL_BUCKETS. Repeating the last 
data point
+  // where all buckets are merged biases the curve fitting to hit that data 
point which
+  // makes sense because that's likely the most accurate one. The number of 
data points
+  // are sufficient for reasonable accuracy.
+  int pidx = 0;
+  for (int i = 0; i < SampledNdvState::NUM_HLL_BUCKETS; ++i) {
+    uint8_t merged_hll_data[HLL_LEN];
+    memset(merged_hll_data, 0, HLL_LEN);
+    StringVal merged_hll(merged_hll_data, HLL_LEN);
+    int64_t merged_count = 0;
+    for (int j = 0; j < SampledNdvState::NUM_HLL_BUCKETS; ++j) {
+      int bucket_idx = (i + j) % SampledNdvState::NUM_HLL_BUCKETS;
+      merged_count += state->buckets[bucket_idx].row_count;
+      counts[pidx] = merged_count;
+      StringVal hll = StringVal(state->buckets[bucket_idx].hll, HLL_LEN);
+      HllMerge(ctx, hll, &merged_hll);
+      ndvs[pidx] = HllFinalEstimate(merged_hll.ptr, HLL_LEN);
+      ++pidx;
+    }
+    min_count = std::min(min_count, state->buckets[i].row_count);
+    min_ndv = std::min(min_ndv, ndvs[i * SampledNdvState::NUM_HLL_BUCKETS]);
+  }
+  // Based on the point-generation method above the last elements represent 
the data
+  // point where all buckets are merged.
+  int64_t max_count = counts[num_points - 1];
+  int64_t max_ndv = ndvs[num_points - 1];
+
+  // Scale all values to [0,1] since some objective functions require it 
(e.g., Sigmoid).
+  double count_scale = max_count - min_count;
+  double ndv_scale = max_ndv - min_ndv;
+  if (count_scale == 0) count_scale = 1.0;
+  if (ndv_scale == 0) ndv_scale = 1.0;
+  double scaled_counts[num_points];
+  double scaled_ndvs[num_points];
+  for (int i = 0; i < num_points; ++i) {
+    scaled_counts[i] = counts[i] / count_scale;
+    scaled_ndvs[i] = ndvs[i] / ndv_scale;
+  }
+
+  // List of objective functions. Curve fitting will select the best values 
for the
+  // parameters a, b, c, d.
+  vector<ObjectiveFunction> ndv_fns;
+  // Linear function: f(x) = a + b * x
+  ndv_fns.push_back(ObjectiveFunction("LIN", 2,
+      [](double x, const double* params) -> double {
+        return params[0] + params[1] * x;
+      }
+  ));
+  // Logarithmic function: f(x) = a + b * log(x)
+  ndv_fns.push_back(ObjectiveFunction("LOG", 2,
+      [](double x, const double* params) -> double {
+        return params[0] + params[1] * log(x);
+      }
+  ));
+  // Power function: f(x) = a + b * pow(x, c)
+  ndv_fns.push_back(ObjectiveFunction("POW", 3,
+      [](double x, const double* params) -> double {
+        return params[0] + params[1] * pow(x, params[2]);
+      }
+  ));
+  // Sigmoid function: f(x) = a + b * (c / (c + pow(d, -x)))
+  ndv_fns.push_back(ObjectiveFunction("SIG", 4,
+      [](double x, const double* params) -> double {
+        return params[0] + params[1] * (params[2] / (params[2] + 
pow(params[3], -x)));
+      }
+  ));
+
+  // Perform least mean squares fitting on all objective functions.
+  vector<ObjectiveFunction> valid_ndv_fns;
+  for (ObjectiveFunction& f: ndv_fns) {
+    if(f.LmsFit(scaled_counts, scaled_ndvs, num_points)) {
+      valid_ndv_fns.push_back(std::move(f));
+    }
+  }
+
+  // Select the best-fit function for estimating the NDV.
+  auto best_fit_fn = min_element(valid_ndv_fns.begin(), valid_ndv_fns.end(),
+      [](const ObjectiveFunction& a, const ObjectiveFunction& b) -> bool {
+        return a.GetError() < b.GetError();
+      }
+  );
+
+  // Compute the extrapolated NDV based on the extrapolated row count.
+  double extrap_count = max_count / state->sample_perc;
+  double scaled_extrap_count = extrap_count / count_scale;
+  double scaled_extrap_ndv = best_fit_fn->GetY(scaled_extrap_count);
+  return round(scaled_extrap_ndv * ndv_scale);
+}
+
 // An implementation of a simple single pass variance algorithm. A standard 
UDA must
 // be single pass (i.e. does not scan the table more than once), so the most 
canonical
 // two pass approach is not practical.
@@ -2140,6 +2322,27 @@ template void AggregateFunctions::HllUpdate(
 template void AggregateFunctions::HllUpdate(
     FunctionContext*, const DecimalVal&, StringVal*);
 
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const BooleanVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const TinyIntVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const SmallIntVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const IntVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const BigIntVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const FloatVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const DoubleVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const StringVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const TimestampVal&, const DoubleVal&, StringVal*);
+template void AggregateFunctions::SampledNdvUpdate(
+    FunctionContext*, const DecimalVal&, const DoubleVal&, StringVal*);
+
 template void AggregateFunctions::KnuthVarUpdate(
     FunctionContext*, const TinyIntVal&, StringVal*);
 template void AggregateFunctions::KnuthVarUpdate(

http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/be/src/exprs/aggregate-functions.h
----------------------------------------------------------------------
diff --git a/be/src/exprs/aggregate-functions.h 
b/be/src/exprs/aggregate-functions.h
index 5ebcb97..81884c1 100644
--- a/be/src/exprs/aggregate-functions.h
+++ b/be/src/exprs/aggregate-functions.h
@@ -204,6 +204,23 @@ class AggregateFunctions {
   /// estimates.
   static uint64_t HllFinalEstimate(const uint8_t* buckets, int32_t 
num_buckets);
 
+  /// Estimates the number of distinct values (NDV) based on a sample of data 
and the
+  /// corresponding sampling rate. The main idea of this function is to 
collect several
+  /// (x,y) data points where x is the number of rows and y is the 
corresponding NDV
+  /// estimate. These data points are used to fit an objective function to the 
data such
+  /// that the true NDV can be extrapolated.
+  /// This aggregate function maintains a fixed number of HyperLogLog 
intermediates.
+  /// The Update() phase updates the intermediates in a round-robin fashion.
+  /// The Merge() phase combines the corresponding intermediates.
+  /// The Finalize() phase generates (x,y) data points, performs curve 
fitting, and
+  /// computes the estimated true NDV.
+  static void SampledNdvInit(FunctionContext*, StringVal* dst);
+  template <typename T>
+  static void SampledNdvUpdate(FunctionContext*, const T& src,
+      const DoubleVal& sample_perc, StringVal* dst);
+  static void SampledNdvMerge(FunctionContext*, const StringVal& src, 
StringVal* dst);
+  static BigIntVal SampledNdvFinalize(FunctionContext*, const StringVal& src);
+
   /// Knuth's variance algorithm, more numerically stable than canonical stddev
   /// algorithms; reference implementation:
   /// 
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/be/src/thirdparty/mpfit/DISCLAIMER
----------------------------------------------------------------------
diff --git a/be/src/thirdparty/mpfit/DISCLAIMER 
b/be/src/thirdparty/mpfit/DISCLAIMER
new file mode 100644
index 0000000..3e1b76f
--- /dev/null
+++ b/be/src/thirdparty/mpfit/DISCLAIMER
@@ -0,0 +1,77 @@
+
+MPFIT: A MINPACK-1 Least Squares Fitting Library in C
+
+Original public domain version by B. Garbow, K. Hillstrom, J. More'
+  (Argonne National Laboratory, MINPACK project, March 1980)
+  Copyright (1999) University of Chicago
+    (see below)
+
+Tranlation to C Language by S. Moshier (moshier.net)
+  (no restrictions placed on distribution)
+
+Enhancements and packaging by C. Markwardt
+  (comparable to IDL fitting routine MPFIT
+   see http://cow.physics.wisc.edu/~craigm/idl/idl.html)
+  Copyright (C) 2003, 2004, 2006, 2007 Craig B. Markwardt
+
+  This software is provided as is without any warranty whatsoever.
+  Permission to use, copy, modify, and distribute modified or
+  unmodified copies is granted, provided this copyright and disclaimer
+  are included unchanged.
+
+
+Source code derived from MINPACK must have the following disclaimer
+text provided.
+
+===========================================================================
+Minpack Copyright Notice (1999) University of Chicago.  All rights reserved
+
+Redistribution and use in source and binary forms, with or
+without modification, are permitted provided that the
+following conditions are met:
+
+1. Redistributions of source code must retain the above
+copyright notice, this list of conditions and the following
+disclaimer.
+
+2. Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials
+provided with the distribution.
+
+3. The end-user documentation included with the
+redistribution, if any, must include the following
+acknowledgment:
+
+   "This product includes software developed by the
+   University of Chicago, as Operator of Argonne National
+   Laboratory.
+
+Alternately, this acknowledgment may appear in the software
+itself, if and wherever such third-party acknowledgments
+normally appear.
+
+4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
+WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
+UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
+THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
+OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
+OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
+USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
+THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
+DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
+UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
+BE CORRECTED.
+
+5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
+HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
+ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
+INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
+ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
+PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
+SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
+(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
+EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
+POSSIBILITY OF SUCH LOSS OR DAMAGES.

http://git-wip-us.apache.org/repos/asf/impala/blob/0936e329/be/src/thirdparty/mpfit/README
----------------------------------------------------------------------
diff --git a/be/src/thirdparty/mpfit/README b/be/src/thirdparty/mpfit/README
new file mode 100644
index 0000000..29a9cef
--- /dev/null
+++ b/be/src/thirdparty/mpfit/README
@@ -0,0 +1,705 @@
+
+MPFIT: A MINPACK-1 Least Squares Fitting Library in C
+
+Original public domain version by B. Garbow, K. Hillstrom, J. More'
+  (Argonne National Laboratory, MINPACK project, March 1980)
+
+Tranlation to C Language by S. Moshier (moshier.net)
+
+Enhancements, documentation and packaging by C. Markwardt
+  (comparable to IDL fitting routine MPFIT
+   see http://cow.physics.wisc.edu/~craigm/idl/idl.html)
+  Copyright (C) 2003, 2004, 2006, 2007, 2009, 2010, 2013 Craig B. Markwardt
+
+
+SUMMARY of CHANGES
+ 16 Feb 2009 - version 1.0 - initial release
+ 18 Feb 2009 - version 1.1 - add 'version' field to 'results' struct
+ 22 Nov 2009 -             - allow to compile with C++ compiler
+                           - change to memset() instead of nonstandard bzero()
+                           - for Microsoft, proprietary equivalent of finite()
+ 04 Oct 2010 -             - add static declarations, remove some compiler 
warnings
+                             (reported by Lars Kr. Lundin)
+ 13 Nov 2010 - version 1.2 - additional documentation, cleanup of mpfit.h
+ 23 Apr 2013 - version 1.3 - add MP_NO_ITER; bug fix mpside=2 when debugging
+                             (thanks to M. Wojdyr)
+
+$Id: README,v 1.18 2016/06/02 19:14:16 craigm Exp $
+
+INTRODUCTION
+
+MPFIT uses the Levenberg-Marquardt technique to solve the
+least-squares problem.  In its typical use, MPFIT will be used to
+fit a user-supplied function (the "model") to user-supplied data
+points (the "data") by adjusting a set of parameters.  MPFIT is
+based upon MINPACK-1 (LMDIF.F) by More' and collaborators.
+
+For example, a researcher may think that a set of observed data
+points is best modelled with a Gaussian curve.  A Gaussian curve is
+parameterized by its mean, standard deviation and normalization.
+MPFIT will, within certain constraints, find the set of parameters
+which best fits the data.  The fit is "best" in the least-squares
+sense; that is, the sum of the weighted squared differences between
+the model and data is minimized.
+
+The Levenberg-Marquardt technique is a particular strategy for
+iteratively searching for the best fit.  This particular
+implementation is drawn from a robust routine called MINPACK-1 (see
+NETLIB).  This version allows upper and lower bounding constraints
+to be placed on each parameter, or the parameter can be held fixed.
+
+The user-supplied function should compute an array of weighted
+deviations between model and data.  In a typical scientific problem
+the residuals should be weighted so that each deviate has a
+gaussian sigma of 1.0.  If X represents values of the independent
+variable, Y represents a measurement for each value of X, and ERR
+represents the error in the measurements, then the deviates could
+be calculated as follows:
+
+    for (i=0; i<m; i++) {
+      deviates[i] = (y[i] - f(x[i])) / err[i];
+    }
+
+where m is the number of data points, and where F is the 
+function representing the model evaluated at X.  If ERR are the
+1-sigma uncertainties in Y, then the sum of deviates[i] squared will
+be the total chi-squared value, which MPFIT will seek to minimize.
+
+Simple constraints can be placed on parameter values by using the
+"pars" parameter to MPFIT, and other parameter-specific options can be
+set.  See below for a description of this optional parameter
+descriptor.
+
+MPFIT does not perform more general optimization tasks.  MPFIT is
+customized, based on MINPACK-1, to the least-squares minimization
+problem.
+
+BASIC CALLING INTERFACE of MPFIT
+
+At the very least, the user must supply a 'user function,' which
+calculates the residuals as described above, and one or more
+parameters.  The calling interface to mpfit is:
+
+  int mpfit(mp_func funct, int m, int npar,
+            double *xall, mp_par *pars, mp_config *config, 
+            void *private_data, 
+            mp_result *result);
+
+The user function <b>funct</b> is a C function which computes the
+residuals using any user data that is required.  The number of
+residuals is specified by the integer <b>m</b>.  The nature and
+properties of user function are described in greater detail below.
+
+The user function parameters are passed to mpfit as the array
+  double xall[npar];
+where <b>npar</b> is the number of parameters of the user function.
+It must be the case that m > npar, i.e. there are more data points
+than free parameters.  The user must pass an initial "best guess" of
+the user parameters to mpfit(), and upon successful return, the xall[]
+array will contain the "best fit" parameters of the fit (and the original
+values are overwritten).
+
+The user function is responsible to compute an array of generic
+residuals.  Usually these residuals will depend on user-dependent
+quantities such as measured data or independent variables such "x"
+when fitting a function of the form y(x).  The user should pass these
+quantities using the optional <b>private_data</b> pointer.  If
+private_data is not used then a null pointer should be passed.
+Otherwise, it can be any C pointer, typically a pointer to a structure
+such as this:
+  struct example_private_data {  /* EXAMPLE: fitting y(x) */
+    double *x;         /* x - independent variable of model */
+    double *y;         /* y - measured "y" values */
+    double *y_error;   /* y_error - measurement uncertainty in y */
+  };
+The user would be responsible to declare such a structure, and to fill
+it with pointers to the relevant data arrays before calling mpfit().
+mpfit() itself does not inspect or modify the private_data contents,
+but merely passes it along to the user function
+
+The structure <b>pars</b> is optional, and allows the user to specify
+additional information and constraints about each user function
+parameter.  For example, the user can specify simple bounding
+constraints.  If passed, then pars[] must be dimensioned as,
+  mp_par pars[npar];
+where mp_par is a structure defined in mpfit.h.  If no special 
+parameter information is necessary, then the user can pass 
+pars == 0.
+
+The optional structure <b>config</b> configures how mpfit() behaves,
+and is described in greater detail below.  By passing a null pointer,
+the user obtains the default behavior.
+
+The structure <b>result</b> contains results of the fit, returned by
+mpfit().  The user should pass a pointer to a structure of type
+'mp_result' (which should be zeroed), and upon return, the structure
+is filled with various diagnostic quantities about the fitting run.
+These quantities are described in greater detail below.  If these
+diagnostics are not required, then the user may pass a null pointer.
+
+
+USER FUNCTION
+
+The user must define a function which computes the appropriate values
+as specified above.  The function must compute the weighted deviations
+between the model and the data.  The user function may also optionally
+compute explicit derivatives (see below).  The user function should
+be of the following form:
+
+  int myfunct(int m, int n, double *p, double *deviates,
+              double **derivs, void *private)
+  {
+    int i;
+    /* Compute function deviates */
+    for (i=0; i<m; i++) {
+      deviates[i] = {function of p[] and private data};
+    }
+
+    return 0;
+  }
+
+The user function parameters are defined as follows:
+  int m     - number of data points
+  int n     - number of parameters
+  double *p - array of n parameters 
+  double *deviates - array of m deviates to be returned by myfunct()
+  double **derivs - used for user-computed derivatives (see below)
+                    (= 0  when automatic finite differences are computed)
+
+User functions may also indicate a fatal error condition by returning
+a negative error code.  Error codes from -15 to -1 are reserved for
+the user function.
+
+
+EXAMPLE 1 - USER FUNCTION y(x)
+
+Here is a sample user function which computes the residuals for a simple
+model fit of the form y = f(x).   The residuals are defined as 
+(y[i] - f(x[i]))/y_error[i].   We will use the example_private_data
+structure defined above.  The user function would appear like this:
+
+  int myfunct_y_x(int m, int n, double *p, double *deviates,
+                 double **derivs, struct example_private_data *private)
+  {
+    int i;
+    double *x, *y, *y_error;
+
+    /* Retrieve values of x, y and y_error from private structure */
+    x = private.x;
+    y = private.y;
+    y_error = private.y_error;
+
+    /* Compute function deviates */
+    for (i=0; i<m; i++) {
+      deviates[i] = (y[i] - f(x[i], p)) / y_error[i];
+    }
+
+    return 0;
+  }
+
+In this example, the user would be responsible to define the function
+f(x,p) where x is the independent variable and p is the array of user
+parameters.  Although this example shows y = f(x,p), it is trivial to
+extend it to additional independent variables, such as u = f(x,y,p),
+by adding additional elements to the private data structure.
+
+The setup and call to mpfit() could look something like this:
+  int m = 10;  /* 10 data points */
+  double x[10] = {1,2,3,4,5,6,7,8,9,10}; /* Independent variable */
+  double y[10] = {-0.42,1.59,0.8,3.2,2.09,6.39,6.29,9.01,7.47,7.58};/* Y 
measurements*/
+  double y_error[10] = {1,1,1,1,1,1,1,1,1,1}; /* Y uncertainties */
+  struct example_private_data private;
+
+  double xall[2] = { 1.0, 1.0 };  /* User parameters - initial guess */
+  /* Fill private data structure to pointers with user data */
+  private.x = x;
+  private.y = y;
+  private.y_error = y_error;
+
+  /* Call mpfit() */
+  status = mpfit(&myfunct_y_x, m, npar, xall, 0, 0, &private, 0);
+
+USER-COMPUTED DERIVATIVES
+
+The user function can also compute function derivatives, which are
+used in the minimization process.  This can be useful to save time, or
+when the derivative is tricky to evaluate numerically.  
+
+Users should pass the "pars" parameter (see below), and for the 'side'
+structure field, the value of 3 should be passed.  This indicates to
+mpfit() that it should request the user function will compute the
+derivative.  NOTE: this setting is specified on a parameter by
+parameter basis.  This means that users are free to choose which
+parameters' derivatives are computed explicitly and which
+numerically.  A combination of both is allowed.  However, since the
+default value of 'side' is 0, derivative estimates will be numerical
+unless explicitly requested using the 'side' structure field.
+
+The user function should only compute derivatives when the 'derivs'
+parameter is non-null.  Note that even when user-computed derivatives
+are enabled, mpfit() may call the user function with derivs set to
+null, so the user function must check before accessing this pointer.
+
+The 'derivs' parameter is an array of pointers, one pointer for each
+parameter.  If derivs[i] is non-null, then derivatives are requested
+for the ith parameter.  Note that if some parameters are fixed, or
+pars[i].side is not 3 for some parameters, then derivs[i] will be null
+and the derivatives do not need to be computed for those parameters.
+
+derivs[j][i] is the derivative of the ith deviate with respect to the
+jth parameter (for 0<=i<m, 0<=j<n).  Storage has already been
+allocated for this array, and the user is not responsible for freeing
+it.  The user function may assume that derivs[j][i] are initialized to
+zero.
+
+The function prototype for user-computed derivatives is:
+
+  int myfunct_with_derivs(int m, int n, double *p, double *deviates,
+                          double **derivs, void *private)
+  {
+    int i;
+    /* Compute function deviates */
+    for (i=0; i<m; i++) {
+      deviates[i] = {function of x[i], p and private data};
+    }
+
+    /* If derivs is non-zero then user-computed derivatives are 
+       requested */
+    if (derivs) {
+      int j;
+      for (j=0; j<n; j++) if (derivs[j]) {
+        /* It is only required to compute derivatives when
+           derivs[ipar] is non-null */
+        for (i=0; i<m; i++) {
+          derivs[j][i] = {derivative of the ith deviate with respect to
+                          the jth parameter = d(deviates[i])/d(par[j])}
+        }
+      }
+    }
+
+    return 0;
+  }
+
+TESTING EXPLICIT DERIVATIVES
+
+In principle, the process of computing explicit derivatives should be
+straightforward.  In practice, the computation can be error prone,
+often being wrong by a sign or a scale factor.
+
+In order to be sure that the explicit derivatives are correct, the
+user can set pars[i].deriv_debug = 1 for parameter i (see below for a
+description of the "pars" structure).  This will cause mpfit() to
+print *both* explicit derivatives and numerical derivatives to the
+console so that the user can compare the results.  This would
+typically be used during development and debugging to be sure the
+calculated derivatives are correct, than then deriv_debug would be set
+to zero for production use.
+
+If you want debugging derivatives, it is important to set pars[i].side
+to the kind of numerical derivative you want to compare with.
+pars[i].side should be set to 0, 1, -1, or 2, and *not* set to 3.
+When pars[i].deriv_debug is set, then mpfit() automatically
+understands to request user-computed derivatives.
+
+The console output will be sent to the standard output, and will
+appear as a block of ASCII text like this:
+  FJAC DEBUG BEGIN
+  # IPNT FUNC DERIV_U DERIV_N DIFF_ABS DIFF_REL
+  FJAC PARM 1
+  ....  derivative data for parameter 1 ....
+  FJAC PARM 2
+  ....  derivative data for parameter 2 ....
+  ....  and so on ....
+  FJAC DEBUG END
+
+which is to say, debugging data will be bracketed by pairs of "FJAC
+DEBUG" BEGIN/END phrases.  Derivative data for individual parameter i
+will be labeled by "FJAC PARM i".  The columns are, in order,
+
+  IPNT - data point number j
+  FUNC - user function evaluated at X[j]
+  DERIV_U - user-calculated derivative d(FUNC(X[j]))/d(P[i])
+  DERIV_N - numerically calculated derivative according to pars[i].side value
+  DIFF_ABS - difference between DERIV_U and DERIV_N = fabs(DERIV_U-DERIV_N)
+  DIFF_REL - relative difference = fabs(DERIV_U-DERIV_N)/DERIV_U
+
+Since individual numerical derivative values may contain significant
+round-off errors, it is up to the user to critically compare DERIV_U
+and DERIV_N, using DIFF_ABS and DIFF_REL as a guide. 
+
+
+CONSTRAINING PARAMETER VALUES WITH THE PARS PARAMETER
+
+The behavior of MPFIT can be modified with respect to each
+parameter to be fitted.  A parameter value can be fixed; simple
+boundary constraints can be imposed; and limitations on the
+parameter changes can be imposed.
+
+If fitting constraints are to be supplied, then the user should pass
+an array of mp_par structures to mpfit() in the pars parameter.  If
+pars is set to 0, then the fitting parameters are asssumed to be
+unconstrained.
+
+pars should be an array of structures, one for each parameter.
+Each parameter is associated with one element of the array, in
+numerical order.  The structure is declared to have the following
+fields:
+
+    .fixed - a boolean value, whether the parameter is to be held
+             fixed or not.  Fixed parameters are not varied by
+             MPFIT, but are passed on to MYFUNCT for evaluation.
+ 
+    .limited - a two-element boolean array.  If the first/second
+               element is set, then the parameter is bounded on the
+               lower/upper side.  A parameter can be bounded on both
+               sides.
+
+ 
+    .limits - a two-element float or double array.  Gives the
+              parameter limits on the lower and upper sides,
+              respectively.  Zero, one or two of these values can be
+              set, depending on the values of LIMITED.
+
+    .parname - a string, giving the name of the parameter.  The
+               fitting code of MPFIT does not use this tag in any
+               way.  However, it may be used for output purposes.
+
+    .step - the step size to be used in calculating the numerical
+            derivatives.  If set to zero, then the step size is
+            computed automatically.
+            This value is superceded by the RELSTEP value.
+
+    .relstep - the *relative* step size to be used in calculating
+               the numerical derivatives.  This number is the
+               fractional size of the step, compared to the
+               parameter value.  This value supercedes the STEP
+               setting.  If the parameter is zero, then a default
+               step size is chosen.
+
+    .side   - the sidedness of the finite difference when computing
+              numerical derivatives.  This field can take four
+              values:
+
+                 0 - one-sided derivative computed automatically
+                 1 - one-sided derivative (f(x+h) - f(x)  )/h
+                -1 - one-sided derivative (f(x)   - f(x-h))/h
+                 2 - two-sided derivative (f(x+h) - f(x-h))/(2*h)
+                 3 - user-computed explicit derivatives
+
+             Where H is the STEP parameter described above.  The
+             "automatic" one-sided derivative method will chose a
+             direction for the finite difference which does not
+             violate any constraints.  The other methods do not
+             perform this check.  The two-sided method is in
+             principle more precise, but requires twice as many
+             function evaluations.  Default: 0.
+
+    .deriv_debug - flag to enable/disable console debug logging of
+             user-computed derivatives, as described above.  1=enable
+             debugging; 0=disable debugging.  Note that when
+             pars[i].deriv_debug is set, then pars[i].side should be
+             set to 0, 1, -1 or 2, depending on which numerical
+             derivative you wish to compare to.
+             Default: 0.
+ 
+RETURN VALUE of MPFIT()
+
+mpfit() returns an integer status code.  In principle, any positive return
+value from mpfit() indicates success or partial success.
+
+The possible error codes are integer constants:
+MP_ERR_INPUT        /* General input parameter error */
+MP_ERR_NAN          /* User function produced non-finite values */
+MP_ERR_FUNC         /* No user function was supplied */
+MP_ERR_NPOINTS      /* No user data points were supplied */
+MP_ERR_NFREE        /* No free parameters */
+MP_ERR_MEMORY       /* Memory allocation error */
+MP_ERR_INITBOUNDS   /* Initial values inconsistent w constraints*/
+MP_ERR_BOUNDS       /* Initial constraints inconsistent */
+MP_ERR_PARAM        /* General input parameter error */
+MP_ERR_DOF          /* Not enough degrees of freedom */
+
+The possible success status codes are:
+MP_OK_CHI           /* Convergence in chi-square value */
+MP_OK_PAR           /* Convergence in parameter value */
+MP_OK_BOTH          /* Both MP_OK_PAR and MP_OK_CHI hold */
+MP_OK_DIR           /* Convergence in orthogonality */
+MP_MAXITER          /* Maximum number of iterations reached */
+MP_FTOL             /* ftol is too small; no further improvement*/
+MP_XTOL             /* xtol is too small; no further improvement*/
+MP_GTOL             /* gtol is too small; no further improvement*/
+
+
+CONFIGURING MPFIT()
+
+mpfit() is primarily configured using the config parameter, which in
+turn is defined as a pointer to the following structure:
+
+  struct mp_config_struct {
+    /* NOTE: the user may set the value explicitly; OR, if the passed
+       value is zero, then the "Default" value will be substituted by
+       mpfit(). */
+    double ftol;    /* Relative chi-square convergence criterium Default: 
1e-10 */
+    double xtol;    /* Relative parameter convergence criterium  Default: 
1e-10 */
+    double gtol;    /* Orthogonality convergence criterium       Default: 
1e-10 */
+    double epsfcn;  /* Finite derivative step size               Default: 
MP_MACHEP0 */
+    double stepfactor; /* Initial step bound                     Default: 
100.0 */
+    double covtol;  /* Range tolerance for covariance calculation Default: 
1e-14 */
+    int maxiter;    /* Maximum number of iterations.  If maxiter == MP_NO_ITER,
+                       then basic error checking is done, and parameter
+                       errors/covariances are estimated based on input
+                       parameter values, but no fitting iterations are done. 
+                       Default: 200
+                 */
+    int maxfev;     /* Maximum number of function evaluations, or 0 for no 
limit
+                      Default: 0 (no limit) */
+    int nprint;     /* Default: 1 */
+    int douserscale;/* Scale variables by user values?
+                      1 = yes, user scale values in diag;
+                      0 = no, variables scaled internally (Default) */
+    int nofinitecheck; /* Disable check for infinite quantities from user?
+                       0 = do not perform check (Default)
+                       1 = perform check 
+                      */
+    mp_iterproc iterproc; /* Placeholder pointer - must set to 0 */
+  };
+
+Generally speaking, a value of 0 for a field in the structure above
+indicates that a default value should be used, indicated in
+parentheses.  Therefore, a user should zero this structure before
+passing it.
+
+
+EXTRACTING RESULTS FROM MPFIT()
+
+The basic result of mpfit() is the set of updated parameters, xall.
+However, there are other auxiliary quantities that the user can
+extract by using the results parameter.  This is a structure defined
+like this:
+
+  /* Definition of results structure, for when fit completes */
+  struct mp_result_struct {
+    double bestnorm;     /* Final chi^2 */
+    double orignorm;     /* Starting value of chi^2 */
+    int niter;           /* Number of iterations */
+    int nfev;            /* Number of function evaluations */
+    int status;          /* Fitting status code */
+
+    int npar;            /* Total number of parameters */
+    int nfree;           /* Number of free parameters */
+    int npegged;         /* Number of pegged parameters */
+    int nfunc;           /* Number of residuals (= num. of data points) */
+  
+    double *resid;       /* Final residuals
+                         nfunc-vector, or 0 if not desired */
+    double *xerror;      /* Final parameter uncertainties (1-sigma)
+                         npar-vector, or 0 if not desired */
+    double *covar;       /* Final parameter covariance matrix
+                         npar x npar array, or 0 if not desired */
+    char version[20];    /* MPFIT version string */
+  };  
+
+All of the scalar numeric quantities are filled when mpfit() returns,
+and any incoming value will be overwritten.
+
+If the user would like the final residuals, parameter errors, or the
+covariance matrix, then they should allocate the required storage, and
+pass a pointer to it in the corresponding structure field.  A pointer
+value of zero indicates that the array should not be returned.  Thus,
+by default, the user should zero this structure before passing it.
+The user is responsible for allocating and freeing resid, xerror and
+covar storage.
+
+The 'version' string can be used to interpret the behavior of MPFIT,
+in case the behavior changes over time.  Version numbers will be of
+the form "i.j" or "i.j.k" where i, j and k are integers.
+
+
+EXAMPLE 2 - Basic call
+
+This example does the basics, which is to perform the optimization
+with no constraints, and returns the scalar results in the result
+structure.
+
+  mp_result result;
+
+  memset(&result, 0, sizeof(result));
+  status = mpfit(myfunct, m, n, p, 0, 0, 0, &result);
+
+
+EXAMPLE 3 - Requesting Parameter Errors
+
+The only modification needed to retrieve parameter uncertainties is to
+allocate storage for it, and place a pointer to it in result.
+
+  mp_result result;
+  double perror[n];
+
+  memset(&result, 0, sizeof(result));
+  result.xerror = perror;
+  status = mpfit(myfunct, m, n, p, 0, 0, 0, &result);
+
+
+EXAMPLE 4 - Fixing a parameter
+
+This example fixes the third parameter (i.e. p[2]) at its starting
+value
+
+  mp_result result;
+  mp_par    pars[n];
+
+  memset(&pars[0], 0, sizeof(pars));
+  pars[2].fixed = 1;
+
+  memset(&result, 0, sizeof(result));
+  status = mpfit(myfunct, m, n, p, pars, 0, 0, &result);
+
+EXAMPLE 5 - Applying parameter constraints
+
+This example ensures that the third parameter (i.e. p[2]) is always
+greater or equal to ten.
+
+  mp_result result;
+  mp_par    pars[n];
+
+  memset(&pars[0], 0, sizeof(pars));
+  pars[2].limited[0] = 1;    /* limited[0] indicates lower bound */
+  pars[2].limits[0]  = 10.0; /* Actual constraint p[2] >= 10 */
+
+  memset(&result, 0, sizeof(result));
+  status = mpfit(myfunct, m, n, p, pars, 0, 0, &result);
+
+
+EXAMPLE 6 - Increasing maximum number of iterations
+
+This example changes the maximum number of iterations from its default
+to 1000.
+
+  mp_config config;
+  mp_result result;
+
+  memset(&config, 0, sizeof(config));
+  config.maxiter = 1000;
+  memset(&result, 0, sizeof(result));
+  status = mpfit(myfunct, m, n, p, 0, &config, 0, &result);
+
+
+EXAMPLE 7 - Passing private data to user function
+
+This example passes "private" data to its user function using the
+private parameter.  It assumes that three variables, x, y, and ey,
+already exist, and that the user function will know what to do with
+them.
+
+  mp_result result;
+  struct mystruct {
+    double *x;
+    double *y;
+    double *ey;
+  };
+
+  struct mystruct mydata;
+
+  mydata.x = x;
+  mydata.y = y;
+  mydata.ey = ey;
+
+  memset(&result, 0, sizeof(result));
+  status = mpfit(myfunct, m, n, p, 0, 0, (void *)&mydata, &result);
+
+
+EXAMPLE 8 - Complete example
+
+
+This example shows how to fit sample data to a linear function
+y = f(x) = a - b*x, where a and b are fitted parameters.  There is sample
+data included within the program.  The parameters are initialized to
+a = 1.0, b = 1.0, and then the fitting occurs.  The function
+residuals; within linfunc(), the value of f(x) is computed.  The main
+routine, <b>main()</b> initializes the variables and calls mpfit().
+
+
+#include "mpfit.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+
+/* This is the private data structure which contains the data points
+   and their uncertainties */
+struct vars_struct {
+  double *x;
+  double *y;
+  double *ey;
+};
+
+/* 
+ * linear fit function
+ *
+ * m - number of data points
+ * n - number of parameters (2)
+ * p - array of fit parameters 
+ * dy - array of residuals to be returned
+ * vars - private data (struct vars_struct *)
+ *
+ * RETURNS: error code (0 = success)
+ */
+int linfunc(int m, int n, double *p, double *dy, double **dvec, void *vars)
+{
+  int i;
+  struct vars_struct *v = (struct vars_struct *) vars;
+  double *x, *y, *ey, f;
+
+  x = v->x;
+  y = v->y;
+  ey = v->ey;
+
+  for (i=0; i<m; i++) {
+    f = p[0] - p[1]*x[i];     /* Linear fit function; note f = a - b*x */
+    dy[i] = (y[i] - f)/ey[i];
+  }
+
+  return 0;
+}
+
+/* Test harness routine, which contains test data, invokes mpfit() */
+int main(int argc, char *argv[])
+{
+  /* X - independent variable */
+  double x[] = {-1.7237128E+00,1.8712276E+00,-9.6608055E-01,
+               -2.8394297E-01,1.3416969E+00,1.3757038E+00,
+               -1.3703436E+00,4.2581975E-02,-1.4970151E-01,
+               8.2065094E-01};
+  /* Y - measured value of dependent quantity */
+  double y[] = {1.9000429E-01,6.5807428E+00,1.4582725E+00,
+               2.7270851E+00,5.5969253E+00,5.6249280E+00,
+               0.787615,3.2599759E+00,2.9771762E+00,
+               4.5936475E+00};
+  double ey[10];   /* Measurement uncertainty - initialized below */
+   
+  double p[2] = {1.0, 1.0};           /* Initial conditions */
+  double pactual[2] = {3.20, 1.78};   /* Actual values used to make data */
+  double perror[2];                   /* Returned parameter errors */      
+  int i;
+  struct vars_struct v;  /* Private data structure */
+  int status;
+  mp_result result;
+
+  memset(&result,0,sizeof(result));       /* Zero results structure */
+  result.xerror = perror;
+  for (i=0; i<10; i++) ey[i] = 0.07;   /* Data errors */           
+
+  /* Fill private data structure */
+  v.x = x;
+  v.y = y;
+  v.ey = ey;
+
+  /* Call fitting function for 10 data points and 2 parameters */
+  status = mpfit(linfunc, 10, 2, p, 0, 0, (void *) &v, &result);
+
+  printf("*** testlinfit status = %d\n", status);
+  /* ... print or use the results of the fitted parametres p[] here! ... */
+
+  return 0;
+}
+
+

[4/4] impala git commit: IMPALA-5310: Part 2: Add SAMPLED_NDV() function.

Reply via email to