[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

xwu0226 Thu, 19 May 2016 21:43:02 -0700

GitHub user xwu0226 opened a pull request:

    https://github.com/apache/spark/pull/13212


    [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s) command natively

    ## What changes were proposed in this pull request?
    Currently command "ADD FILE|JAR <filepath|jarpath>" is supported natively 
in SparkSQL. However, when this command is run, the file/jar is added to the 
resources that can not be looked up by "LIST FILE(s)|JAR(s)" command because 
the LIST command is passed to Hive command processor in Spark-SQL or simply not 
supported in Spark-shell. There is no way users can find out what files/jars 
are added to the spark context.
    Refer to [Hive 
commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)
    
    This PR is to support following commands:
    `LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])`
    
    ### For example:
    ##### LIST FILE(s)
    ```
    scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
    res1: org.apache.spark.sql.DataFrame = []
    scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt")
    res2: org.apache.spark.sql.DataFrame = []
    
    scala> spark.sql("list file 
hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false)
    +----------------------------------------------+
    |result                                        |
    +----------------------------------------------+
    |hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
    +----------------------------------------------+
    
    scala> spark.sql("list files").show(false)
    +----------------------------------------------+
    |result                                        |
    +----------------------------------------------+
    |hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
    |hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt |
    +----------------------------------------------+
    ```
    
    ##### LIST JAR(s)
    ```
    scala> spark.sql("add jar 
/Users/xinwu/spark/core/src/test/resources/TestUDTF.jar")
    res9: org.apache.spark.sql.DataFrame = [result: int]
    
    scala> spark.sql("list jar TestUDTF.jar").show(false)
    +---------------------------------------------+
    |result                                       |
    +---------------------------------------------+
    |spark://192.168.1.234:50131/jars/TestUDTF.jar|
    +---------------------------------------------+
    
    
    scala> spark.sql("list jars").show(false)
    +---------------------------------------------+
    |result                                       |
    +---------------------------------------------+
    |spark://192.168.1.234:50131/jars/TestUDTF.jar|
    +---------------------------------------------+
    ```
    ## How was this patch tested?
    New test cases are added for Spark-SQL, Spark-Shell and SparkContext API 
code path.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xwu0226/spark list_command

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13212
    
----
commit 3866e3dcbfbd9fe0e18ecde3b23bb14757e06a0c
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-05-08T07:06:36Z

    spark-15206 add testcases for distinct aggregate in having clause following 
up PR12974

commit 951d3edc412ef3d6f77d70a4dd7dd7add966d7b1
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-05-08T07:09:44Z

    Revert "spark-15206 add testcases for distinct aggregate in having clause 
following up PR12974"
    
    This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe.

commit 5b30cc3c0eb20c134e21942ef96a26e452f9171c
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-05-17T22:09:57Z

    adding spark native support for LIST FILES/JARS

commit 6396ec1591134ca3fd754a6a2684bc8b81218951
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-05-17T22:52:31Z

    update testcase

commit 79e97be7917d23f44f60cc857a471b14cb96831c
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-05-19T07:07:02Z

    support listing specific file(s)

commit a4dc6164ff51b428dae282aa90042758c4ae33d7
Author: Xin Wu <xi...@us.ibm.com>
Date:   2016-05-19T07:33:50Z

    update testcases

commit 688c294060cb00cd6c387591bf700e58bdd3dba8
Author: Xin Wu <xi...@us.ibm.com>
Date:   2016-05-19T22:57:16Z

    align with PR 13122

commit a0a76a3c5ff93dbf42f07bebd54b7a3514e87132
Author: Xin Wu <xi...@us.ibm.com>
Date:   2016-05-19T23:07:32Z

    code style

commit 923988ac5d21e0c0afc6bf76d21a27e8f46f1246
Author: Xin Wu <xi...@us.ibm.com>
Date:   2016-05-19T23:11:36Z

    code style

commit 21b092ab84b22abec93fde1fc1ca177db68d9f0d
Author: Xin Wu <xi...@us.ibm.com>
Date:   2016-05-20T04:16:26Z

    update comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

Reply via email to