alamb commented on code in PR #22978:
URL: https://github.com/apache/datafusion/pull/22978#discussion_r3422863614


##########
datafusion/sqllogictest/test_files/parquet_metadata_functions.slt:
##########


Review Comment:
   I suggest  consolidating the the parquet tests in 
datafusion/sqllogictest/test_files/input_file_name.slt  so the tests for the 
same function live together in the same file



##########
docs/source/user-guide/sql/scalar_functions.md:
##########
@@ -5959,6 +5960,26 @@ get_field(expression, field_name[, field_name2, ...])
 +--------+
 ```
 
+### `input_file_name`
+
+Returns the path of the input file that produced the current row.
+
+Note: file paths/URIs may be sensitive metadata depending on your environment.

Review Comment:
   that is an interesting point -- can we add a note to the upgrading guide 
(and the release notes) and explain how to disable the function for people for 
whom it will be a security problem?
   
   Maybe the explanation is just "register a function with the same filename" 🤔 
 Or should we add a config flag as a follow on PR to avoid registering these 🤔 



##########
datafusion/functions/src/core/input_file_name.rs:
##########
@@ -0,0 +1,95 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! [`InputFileNameFunc`]: Implementation of the `input_file_name` function.
+
+use arrow::datatypes::DataType;
+use datafusion_common::{exec_err, utils::take_function_args};
+use datafusion_doc::Documentation;
+use datafusion_expr::{
+    ColumnarValue, ExpressionPlacement, ScalarFunctionArgs, ScalarUDFImpl, 
Signature,
+    Volatility,
+};
+use datafusion_macros::user_doc;
+
+#[user_doc(
+    doc_section(label = "Other Functions"),
+    description = r#"Returns the path of the input file that produced the 
current row.
+
+Note: file paths/URIs may be sensitive metadata depending on your environment.
+
+This function is intended to be rewritten at file-scan time (when the file is

Review Comment:
   It occurs to me that we might want to add some documentation (as a follow on 
PR) to the TableProvider about functions that might need special handling (e.g. 
input_file_name, input_row_number, get_field)
   
   Otherwise people implementing table providers might not know it would be 
helpful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to