[
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625691#comment-14625691
]
Ted Yu commented on HBASE-13992:
--------------------------------
Uploading onto reviewboard would make reviewing easier.
{code}
+ <spark.version>1.3.0</spark.version>
{code}
Should newer Spark release, such as 1.4.0, be used ?
{code}
+ <!-- <scope>test</scope> Return-->
{code}
Uncomment the above ?
Please add short javadoc for the XXExample classes.
{code}
+ System.out
+ .println("JavaHBaseBulkGetExample {master} {tableName}");
{code}
Merge above two lines.
For JavaHBaseDistributedScan:
{code}
+ results.size();
+ }
{code}
Did you intend to print the result size ?
For JavaHBaseMapGetPutExample, GetFunction isn't called.
{code}
+ .println("JavaHBaseBulkPutExample {master} {host} {post}
{tableName} {columnFamily}");
{code}
post -> port
For HBaseContext,
{code}
+ * serializable Configuration object
{code}
There're 3 parameters to HBaseContext. Above is one of them. Did you intend to
provide scaladoc for all of them ?
{code}
+ def mapPartition[T, R: ClassTag](rdd: RDD[T],
{code}
Should the above method be called mapPartitions (to align with method of RDD) ?
{code}
+ def streamForeachRDD[T](dstream: DStream[T],
{code}
Should the method be called streamForeachPartition since there is foreachRDD
method which accepts DStream already.
> Integrate SparkOnHBase into HBase
> ---------------------------------
>
> Key: HBASE-13992
> URL: https://issues.apache.org/jira/browse/HBASE-13992
> Project: HBase
> Issue Type: New Feature
> Components: spark
> Reporter: Ted Malaska
> Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-13992.patch
>
>
> This Jira is to ask if SparkOnHBase can find a home in side HBase core.
> Here is the github:
> https://github.com/cloudera-labs/SparkOnHBase
> I am the core author of this project and the license is Apache 2.0
> A blog explaining this project is here
> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
> A spark Streaming example is here
> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
> A real customer using this in produce is blogged here
> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
> Please debate and let me know what I can do to make this happen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)