Thanks Ted! On Thu, Jan 14, 2016 at 4:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #1, yes it is possible. > > You can find some example in hbase-spark module of hbase where hbase as > DataSource is provided. > e.g. > > https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala > > Cheers > > On Thu, Jan 14, 2016 at 5:04 AM, Kristoffer Sjögren <sto...@gmail.com> > wrote: >> >> Hi >> >> We have a RDD<UserId> that needs to be mapped with information from >> HBase, where the exact key is the user id. >> >> What's the different alternatives for doing this? >> >> - Is it possible to do HBase.get() requests from a map function in Spark? >> - Or should we join RDDs with all full HBase table scan? >> >> I ask because full table scans feels inefficient, especially if the >> input RDD<UserId> is really small compared to the full table. But I >> realize that a full table scan may not be what happens in reality? >> >> Cheers, >> -Kristoffer >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org