[GitHub] incubator-zeppelin pull request: Add Scala utility functions for d...

doanduyhai Sat, 23 May 2015 13:37:21 -0700

GitHub user doanduyhai opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/80


    Add Scala utility functions for display

    Until now, to display data as a table, there are 2 alternatives:
    
    1. Either use **Spark DataFrame** and Zeppeline built-in support
    2. Or generate manually a `println(%table ...)`. As an example of 
displaying an `RDD[(String,String,Int)]` representing a collection of users:
    
    ```scala
    val data = new java.lang.StringBuilder("%table Login\tName\tAge\n")
    rdd.foreach {
       case (login,name,age) => data.append(s"$login\t$name\t$age\n")
    }
    
    println(data.toString())
    ```
    
    My proposal is to add a new utility function to make creating tables easier 
that the code example above. Of course one can always use **Spark DataFrame** 
but I find it quite restrictive. People using Spark versions lesser than 1.3 
cannot rely on DataFrame and sometimes one does not want to transform an RDD to 
DataFrame for display.
    
    How are the utility functions implemented ?
    
    1. I added a new module **spark-utils** which provide Scala code for 
display utility functions. This module will use the **maven-scala-plugin** to 
compile all the classes in package `org.apache.zeppelin.spark.utils`. 
    
    2. Right now the package `org.apache.zeppelin.spark.utils` only contains 1 
object `DisplayUtils` which augments RDDs of Tuples or RDDs of Scala case 
classes (all of them sub-class of trait `Product`) with the new method 
`displayAsTable(columnLabels: String*)`.
    
    3. The `DisplayUtils` object is imported automatically into the 
`SparkInterpreter` with `intp.interpret("import 
org.apache.zeppelin.spark.utils.DisplayUtils._");`
    
    4. The Maven module **interpreter** will now have a **runtime** dependency 
on the module **spark-utils** so that the utility class will be loaded at 
runtime
    
    5. Usage of the new display utility function is:
    
        **Paragraph1**
        ```scala
        case class Person(login: String, name: String, age: Int)
        val rddTuples:RDD[(String,String,Int)] = 
sc.parallelize(List(("jdoe","John DOE",32),("hsue","Helen     SUE",27))
        val rddCaseClass:RDD[(String,String,Int)] = 
sc.parallelize(List(Person("jdoe","John DOE",32),Person("hsue","Helen SUE",27))
        ```
        **Paragraph2**
        ```scala
        rddTuples.displayAsTable("Login","Name","Age")
        ```
        
        **Paragraph3**
        ```scala
        rddCaseClass.displayAsTable("Login","Name","Age")
        ```
    
    6. The `displayAsTable()` method is error-proof, meaning that if the user 
provides **more** columns label that the number of elements in the tuples/case 
class, the extra column labels will ignored. If the user provides **less** 
column labels than expected, the method will pad missing column headers with 
**Column2**, **Column3** etc ...
    
    7. In addition to the `displayAsTable` methods, I added some other utility 
methods to make it easier to handle custom HTML and images:
        a. calling `html()` will generate the string `"%html "`
        b. calling `html("<p> This is a test</p>)` will generate the string 
`"%html <p> This is a test</p>"`
        c. calling `img("http://www.google.com";)` will generate the string 
`"<img src='http://www.google.com' />"`
       d. calling `img64()` will generate the string `"%img "`
       e. calling `img64("ABCDE123")` will generate the string `"%img ABCDE123"`
    
    Of course the `DisplayUtils` object can be extended with new other 
functions to support future advanced displaying features


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/doanduyhai/incubator-zeppelin DisplayUtils

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/80.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #80
    
----
commit 5edab7130e70cf9f765dc268648d8ef294251b37
Author: DuyHai DOAN <[email protected]>
Date:   2015-05-23T19:49:33Z

    Add new module spark-utils to expose utility functions for display

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add Scala utility functions for d...

Reply via email to