kosiew commented on code in PR #1416:
URL: 
https://github.com/apache/datafusion-python/pull/1416#discussion_r2929384345


##########
python/datafusion/functions.py:
##########
@@ -1894,6 +1894,15 @@ def approx_distinct(
     Args:
         expression: Values to check for distinct entries
         filter: If provided, only compute against rows for which the filter is 
True
+
+    Examples:
+    ---------
+    >>> ctx = dfn.SessionContext()
+    >>> df = ctx.from_pydict({"a": [1, 1, 2, 3]})
+    >>> result = df.aggregate(
+    ...     [], [dfn.functions.approx_distinct(dfn.col("a")).alias("v")])
+    >>> result.collect_column("v")[0].as_py() >= 2

Review Comment:
   `>= 2` is a weak regression signal for a 4-row input with 3 distinct values. 
   
   Could we pick an input where the approximation is still deterministic enough 
to show a concrete answer, or at least tighten the expectation so the example 
documents the intended behavior more clearly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to