wizard-420 commented on a change in pull request #943:
URL: https://github.com/apache/systemml/pull/943#discussion_r435209789
##########
File path: dev/docs/builtins-reference.md
##########
@@ -291,34 +291,95 @@ y = X %*% rand(rows=ncol(X), 1)
[C, S] = steplm(X = X, y = y, icpt = 1);
```
-## `slicefinder`-Function
+## `outlier-Function
-The `slicefinder`-function returns top-k worst performing subsets according to
a model calculation.
+An outlier in a probability distribution function is a number that is more
than 1.5 times the length of the data set away from either the lower or upper
quartiles.
+Specifically, if a number is less than Q1−1.5×IQR or greater than Q3+1.5×IQR,
then it is an outlier.
### Usage
```r
-slicefinder(X,W, y, k, paq, S);
+outlier(X,opposite);
```
### Arguments
| Name | Type | Default | Description |
| :------ | :------------- | -------- | :---------- |
| X | Matrix[Double] | required | Recoded dataset into Matrix |
-| W | Matrix[Double] | required | Trained model |
-| y | Matrix[Double] | required | 1-column matrix of response values. |
-| k | Integer | 1 | Number of subsets required |
-| paq | Integer | 1 | amount of values wanted for each col,
if paq = 1 then its off |
-| S | Integer | 2 | amount of subsets to combine (for now
supported only 1 and 2) |
+|opposite| Boolean | required | Used for xor gate evaluation |
### Returns
| Type | Description |
| :------------- | :---------- |
-| Matrix[Double] | Matrix containing the information of top_K slices (relative
error, standart error, value0, value1, col_number(sort), rows,
cols,range_row,range_cols, value00, value01,col_number2(sort), rows2,
cols2,range_row2,range_cols2) |
+| Matrix[Double] | 1-column matrix of weights. |
-### Usage
+### Example
```r
X = rand (rows = 50, cols = 10)
-y = X %*% rand(rows=ncol(X), 1)
-w = lm(X = X, y = y)
-ress = slicefinder(X = X,W = w, Y = y, k = 5, paq = 1, S = 2);
+opposite = 1
+outlier(X=X,opposite=opposite)
```
+## outlierByIQR - Function
+
+Builtin function for detecting and repairing outliers using Interquartile
Range.
+A commonly used rule says that a data point is an outlier if it is more than
1.5 IQR
+above the third quartile or below the first quartile.
+outlierByIQR function computes the matrix and set's a lower-bound quartile
range and upper-bound quartile range
+and the number which is less then the lower-bound or higher then the
upper-bound is treated as a outlier, hence
+removed from the matrix.
+
+
+### Usage
+```r
+outlierByIQR(X,k,repair_method,max_iterations,verbose)
+`
+### Arguments
+| Name | Type | Default | Description |
+| :------ | :------------- | -------- | :---------- |
+| X | Matrix[Double] | required | matrix with outliers |
+|k | Double | 1.5 | a constant used to discern
outliers k*IQR
+ |isIterative| Boolean | TRUE |iterative repair or single repair
+ |repairMethod| Integer| 1 | values: 0 = delete rows having
outliers,
+ 1 = replace
outliers with zeros
+ 2 = replace
outliers as missing values
+ |max_iterations| Integer | 0 | values: 0 = arbitrary number of
iteraition until all outliers are removed,
+ n = any constant
defined by user
+### Returns
+| Type | Description |
+| :------------- | :---------- |
+| Matrix[Double] | matrix without any outlier. |
+
+### Example
+```r
+X = rand (rows=10,cols=10)
+opposite = 1
+Y = outlier(X = X, opposite = opposite)
+Z = outlierByIQR(X=Y,k=1.5,repairMethod=0,max_iterations=3,verbose=1)
+print("\n"+toString(Z))
+`
+###outlierBySd - function
+Builtin function for detecting and repairing outliers using standard deviation
+
Review comment:
thanks for your help.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]