[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

Ted Yu (JIRA) Mon, 13 Jul 2015 19:10:33 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625691#comment-14625691
 ]


Ted Yu commented on HBASE-13992:
--------------------------------

Uploading onto reviewboard would make reviewing easier.
{code}
+        <spark.version>1.3.0</spark.version>
{code}
Should newer Spark release, such as 1.4.0, be used ?
{code}
+            <!-- <scope>test</scope> Return-->
{code}
Uncomment the above ?
Please add short javadoc for the XXExample classes.
{code}
+      System.out
+              .println("JavaHBaseBulkGetExample  {master} {tableName}");
{code}
Merge above two lines.

For JavaHBaseDistributedScan:
{code}
+    results.size();
+  }
{code}
Did you intend to print the result size ?

For JavaHBaseMapGetPutExample, GetFunction isn't called.
{code}
+              .println("JavaHBaseBulkPutExample  {master} {host} {post} 
{tableName} {columnFamily}");
{code}
post -> port

For HBaseContext,
{code}
+  * serializable Configuration object
{code}
There're 3 parameters to HBaseContext. Above is one of them. Did you intend to 
provide scaladoc for all of them ?
{code}
+  def mapPartition[T, R: ClassTag](rdd: RDD[T],
{code}
Should the above method be called mapPartitions (to align with method of RDD) ?
{code}
+  def streamForeachRDD[T](dstream: DStream[T],
{code}
Should the method be called streamForeachPartition since there is foreachRDD 
method which accepts DStream already.

> Integrate SparkOnHBase into HBase
> ---------------------------------
>
>                 Key: HBASE-13992
>                 URL: https://issues.apache.org/jira/browse/HBASE-13992
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13992.patch
>
>
> This Jira is to ask if SparkOnHBase can find a home in side HBase core.
> Here is the github: 
> https://github.com/cloudera-labs/SparkOnHBase
> I am the core author of this project and the license is Apache 2.0
> A blog explaining this project is here
> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
> A spark Streaming example is here
> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
> A real customer using this in produce is blogged here
> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
> Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

Reply via email to