[
https://issues.apache.org/jira/browse/SPARK-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-7438.
------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
Issue resolved by pull request 5974
[https://github.com/apache/spark/pull/5974]
> Validation Error while running countApproxDistinct with relative accuracy
> >= 0.38
> --------------------------------------------------------------------------------------
>
> Key: SPARK-7438
> URL: https://issues.apache.org/jira/browse/SPARK-7438
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Vinod KC
> Priority: Minor
> Fix For: 1.4.0
>
>
> Eg Code:
> val a = sc.parallelize(1 to 10000, 20)
> val b = a ++ a ++ a ++ a ++ a
> b.countApproxDistinct(0.38)
> "java.lang.IllegalArgumentException: requirement failed: p (3) must be at
> least 4"
> Issue 1: When relative accuracy >= 0.38, IAE is thrown, as the precision p
> evaluates to 3.
> However,same input in countApproxDistinctByKey(0.38), works fine. Usage of
> relativeSD should be consistent in both countApproxDistinct and
> countApproxDistinctByKey
> Issue 2: Validation error message "p (3) must be at least 4" is not giving a
> clue on what went wrong.
> Issue 3: When relative accuracy < 0.000017, a proper validation error message
> is not shown from countApproxDistinct
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]