you should make hbase a data source(seems we already have hbase connector?), 
create a dataframe from hbase,  and do join in Spark SQL.

> On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote:
> 
> Hello,
> My scenary is like this:
>         1.val df=hivecontext/carboncontex.sql("sql....")
>         2.iterating rows,extrating two columns,id and mvcc, and use id as key 
> to scan hbase to get corresponding value
>             if mvcc==value, this row pass,else drop
> Is there a better way except dataframe.mapPartitions because it cause an 
> extra stage and spend more time.
> I put two DAGs in appendix,please check!
> 
> Thanks!!
> sunerhan1...@sina.com <mailto:sunerhan1...@sina.com><appendix.zip>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> <mailto:dev-unsubscr...@spark.apache.org>

Reply via email to