GitHub user xwu0226 opened a pull request: https://github.com/apache/spark/pull/13212
[SPARK-15431][SQL] Support LIST FILE(s)|JAR(s) command natively ## What changes were proposed in this pull request? Currently command "ADD FILE|JAR <filepath|jarpath>" is supported natively in SparkSQL. However, when this command is run, the file/jar is added to the resources that can not be looked up by "LIST FILE(s)|JAR(s)" command because the LIST command is passed to Hive command processor in Spark-SQL or simply not supported in Spark-shell. There is no way users can find out what files/jars are added to the spark context. Refer to [Hive commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) This PR is to support following commands: `LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])` ### For example: ##### LIST FILE(s) ``` scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt") res2: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false) +----------------------------------------------+ |result | +----------------------------------------------+ |hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt| +----------------------------------------------+ scala> spark.sql("list files").show(false) +----------------------------------------------+ |result | +----------------------------------------------+ |hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt| |hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt | +----------------------------------------------+ ``` ##### LIST JAR(s) ``` scala> spark.sql("add jar /Users/xinwu/spark/core/src/test/resources/TestUDTF.jar") res9: org.apache.spark.sql.DataFrame = [result: int] scala> spark.sql("list jar TestUDTF.jar").show(false) +---------------------------------------------+ |result | +---------------------------------------------+ |spark://192.168.1.234:50131/jars/TestUDTF.jar| +---------------------------------------------+ scala> spark.sql("list jars").show(false) +---------------------------------------------+ |result | +---------------------------------------------+ |spark://192.168.1.234:50131/jars/TestUDTF.jar| +---------------------------------------------+ ``` ## How was this patch tested? New test cases are added for Spark-SQL, Spark-Shell and SparkContext API code path. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xwu0226/spark list_command Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13212.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13212 ---- commit 3866e3dcbfbd9fe0e18ecde3b23bb14757e06a0c Author: xin Wu <xi...@us.ibm.com> Date: 2016-05-08T07:06:36Z spark-15206 add testcases for distinct aggregate in having clause following up PR12974 commit 951d3edc412ef3d6f77d70a4dd7dd7add966d7b1 Author: xin Wu <xi...@us.ibm.com> Date: 2016-05-08T07:09:44Z Revert "spark-15206 add testcases for distinct aggregate in having clause following up PR12974" This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe. commit 5b30cc3c0eb20c134e21942ef96a26e452f9171c Author: xin Wu <xi...@us.ibm.com> Date: 2016-05-17T22:09:57Z adding spark native support for LIST FILES/JARS commit 6396ec1591134ca3fd754a6a2684bc8b81218951 Author: xin Wu <xi...@us.ibm.com> Date: 2016-05-17T22:52:31Z update testcase commit 79e97be7917d23f44f60cc857a471b14cb96831c Author: xin Wu <xi...@us.ibm.com> Date: 2016-05-19T07:07:02Z support listing specific file(s) commit a4dc6164ff51b428dae282aa90042758c4ae33d7 Author: Xin Wu <xi...@us.ibm.com> Date: 2016-05-19T07:33:50Z update testcases commit 688c294060cb00cd6c387591bf700e58bdd3dba8 Author: Xin Wu <xi...@us.ibm.com> Date: 2016-05-19T22:57:16Z align with PR 13122 commit a0a76a3c5ff93dbf42f07bebd54b7a3514e87132 Author: Xin Wu <xi...@us.ibm.com> Date: 2016-05-19T23:07:32Z code style commit 923988ac5d21e0c0afc6bf76d21a27e8f46f1246 Author: Xin Wu <xi...@us.ibm.com> Date: 2016-05-19T23:11:36Z code style commit 21b092ab84b22abec93fde1fc1ca177db68d9f0d Author: Xin Wu <xi...@us.ibm.com> Date: 2016-05-20T04:16:26Z update comments ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org