Re: Spark and HBase RDD join/get

Kristoffer Sjögren Thu, 14 Jan 2016 08:20:29 -0800

Thanks Ted!

On Thu, Jan 14, 2016 at 4:49 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> For #1, yes it is possible.
>
> You can find some example in hbase-spark module of hbase where hbase as
> DataSource is provided.
> e.g.
>
> https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala
>
> Cheers
>
> On Thu, Jan 14, 2016 at 5:04 AM, Kristoffer Sjögren <sto...@gmail.com>
> wrote:
>>
>> Hi
>>
>> We have a RDD<UserId> that needs to be mapped with information from
>> HBase, where the exact key is the user id.
>>
>> What's the different alternatives for doing this?
>>
>> - Is it possible to do HBase.get() requests from a map function in Spark?
>> - Or should we join RDDs with all full HBase table scan?
>>
>> I ask because full table scans feels inefficient, especially if the
>> input RDD<UserId> is really small compared to the full table. But I
>> realize that a full table scan may not be what happens in reality?
>>
>> Cheers,
>> -Kristoffer
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark and HBase RDD join/get

Reply via email to