[jira] [Commented] (SPARK-21485) API Documentation for Spark SQL functions

Sean Owen (JIRA) Thu, 20 Jul 2017 02:45:27 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094441#comment-16094441
 ]


Sean Owen commented on SPARK-21485:
-----------------------------------

GIven how it's generated, the docs are pretty complete, right? meaning it 
documents all the operators and expressions that you could use in Spark SQL. 
The output doc is just what you get from the help messages in Spark SQL? Yeah 
that does seem like a good thing.

> API Documentation for Spark SQL functions
> -----------------------------------------
>
>                 Key: SPARK-21485
>                 URL: https://issues.apache.org/jira/browse/SPARK-21485
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, SQL
>    Affects Versions: 2.3.0
>            Reporter: Hyukjin Kwon
>
> It looks we can generate the documentation from {{ExpressionDescription}} and 
> {{ExpressionInfo}} for Spark's SQL function documentation.
> I had some time to play with this so I just made a rough version - 
> https://spark-test.github.io/sparksqldoc/
> Codes I used are as below :
> In {{pyspark}} shell:
> {code}
> from collections import namedtuple
> ExpressionInfo = namedtuple("ExpressionInfo", "className usage name extended")
> jinfos = 
> spark.sparkContext._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listBuiltinFunctions()
> infos = []
> for jinfo in jinfos:
>     name = jinfo.getName()
>     usage = jinfo.getUsage()
>     usage = usage.replace("_FUNC_", name) if usage is not None else usage
>     extended = jinfo.getExtended()
>     extended = extended.replace("_FUNC_", name) if extended is not None else 
> extended
>     infos.append(ExpressionInfo(
>         className=jinfo.getClassName(),
>         usage=usage,
>         name=name,
>         extended=extended))
> with open("index.md", 'w') as mdfile:
>     strip = lambda s: "\n".join(map(lambda u: u.strip(), s.split("\n")))
>     for info in sorted(infos, key=lambda i: i.name):
>         mdfile.write("### %s\n\n" % info.name)
>         if info.usage is not None:
>             mdfile.write("%s\n\n" % strip(info.usage))
>         if info.extended is not None:
>             mdfile.write("```%s```\n\n" % strip(info.extended))
> {code}
> This change had to be made first before running the codes above:
> {code:none}
> +++ 
> b/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala
> @@ -17,9 +17,15 @@
>  package org.apache.spark.sql.api.python
> +import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
> +import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
>  import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
>  import org.apache.spark.sql.types.DataType
>  private[sql] object PythonSQLUtils {
>    def parseDataType(typeText: String): DataType = 
> CatalystSqlParser.parseDataType(typeText)
> +
> +  def listBuiltinFunctions(): Array[ExpressionInfo] = {
> +    FunctionRegistry.functionSet.flatMap(f => 
> FunctionRegistry.builtin.lookupFunction(f)).toArray
> +  }
>  }
> {code}
> And then, I ran this:
> {code}
> mkdir docs
> echo "site_name: Spark SQL 2.3.0" >> mkdocs.yml
> echo "theme: readthedocs" >> mkdocs.yml
> mv index.md docs/index.md
> mkdocs serve
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21485) API Documentation for Spark SQL functions

Reply via email to