Hyukjin Kwon created SPARK-18667: ------------------------------------ Summary: input_file_name function does not work with UDF Key: SPARK-18667 URL: https://issues.apache.org/jira/browse/SPARK-18667 Project: Spark Issue Type: Bug Components: PySpark Reporter: Hyukjin Kwon
{{input_file_name()}} does not return the file name but empty string instead when it is used as input for UDF in PySpark as below: with the data as below: {code} {"a": 1} {code} with the codes below: {code} from pyspark.sql.functions import * from pyspark.sql.types import * def filename(path): return path sourceFile = udf(filename, StringType()) spark.read.json("tmp.json").select(sourceFile(input_file_name())).show() {code} prints as below: {code} +---------------------------+ |filename(input_file_name())| +---------------------------+ | | +---------------------------+ {code} but the codes below: {code} spark.read.json("tmp.json").select(input_file_name()).show() {code} prints correctly as below: {code} +--------------------+ | input_file_name()| +--------------------+ |file:///Users/hyu...| +--------------------+ {code} This seems PySpark specific issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org