Maciej Szymkiewicz created SPARK-19161: ------------------------------------------
Summary: Improving UDF Docstrings Key: SPARK-19161 URL: https://issues.apache.org/jira/browse/SPARK-19161 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.1.0, 2.0.0, 1.6.0, 1.5.0, 2.2.0 Reporter: Maciej Szymkiewicz Current state Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful docstring: {code} In [1]: from pyspark.sql.types import IntegerType In [2]: from pyspark.sql.functions import udf In [3]: def _add_one(x): """Adds one""" if x is not None: return x + 1 ...: In [4]: add_one = udf(_add_one, IntegerType()) In [5]: ?add_one Type: UserDefinedFunction String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198> File: ~/Spark/spark-2.0/python/pyspark/sql/functions.py Signature: add_one(*cols) Docstring: User defined function in Python .. versionadded:: 1.3 In [6]: help(add_one) Help on UserDefinedFunction in module pyspark.sql.functions object: class UserDefinedFunction(builtins.object) | User defined function in Python | | .. versionadded:: 1.3 | | Methods defined here: | | __call__(self, *cols) | Call self as a function. | | __del__(self) | | __init__(self, func, returnType, name=None) | Initialize self. See help(type(self)) for accurate signature. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) (END) {code} It is possible to extract the function: {code} In [7]: ?add_one.func Signature: add_one.func(x) Docstring: Adds one File: ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac> Type: function In [8]: help(add_one.func) Help on function _add_one in module __main__: _add_one(x) Adds one {code} but it assumes that the final user is aware of the distinction between UDF and built-in functions. Proposed Copy input functions docstring to the UDF object or function wrapper. {code} In [1]: from pyspark.sql.types import IntegerType In [2]: from pyspark.sql.functions import udf In [3]: def _add_one(x): """Adds one""" if x is not None: return x + 1 ...: In [4]: add_one = udf(_add_one, IntegerType()) In [5]: ?add_one Signature: add_one(x) Docstring: Adds one SQL Type: IntegerType File: ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac> Type: function In [6]: help(add_one) Help on function _add_one in module __main__: _add_one(x) Adds one SQL Type: IntegerType (END) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org