And if I may ask, how long it takes in hbase CLI? I would not expect spark to improve performance of hbase. At best spark will push down the filter to hbase. So I would try to optimise any additional overhead like bringing data into spark. On 1 May 2015 00:56, "Ted Yu" <yuzhih...@gmail.com> wrote:
> bq. a single query on one filter criteria > > Can you tell us more about your filter ? How selective is it ? > > Which hbase release are you using ? > > Cheers > > On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale < > siddharth.ub...@syncoms.com> wrote: > >> Hi, >> >> >> >> I want to use Spark as Query engine on HBase with sub second latency. >> >> >> >> I am using Spark 1.3 version. And followed the steps below on Hbase >> table with around 3.5 lac rows : >> >> >> >> *1. *Mapped the Dataframe to Hbase table .RDDCustomers maps to the >> hbase table which is used to create the Dataframe. >> >> *ā DataFrame schemaCustomers = sqlInstance* >> >> * >> .createDataFrame(SparkContextImpl.getRddCustomers(),* >> >> * >> Customers.class);ā * >> >> 2. Used registertemp table i.eā >> *schemaCustomers.registerTempTable("customers");ā* >> >> 3. Running the query on Dataframe using Sqlcontext Instance. >> >> >> >> What I am observing is that for a single query on one filter criteria the >> query is taking 7-8 seconds? And the time increases as I am increasing the >> number of rows in Hbase table. Also, there was one time when I was getting >> query response under 1-2 seconds. Seems like strange behavior. >> >> Is this expected behavior from Spark or am I missing something here? >> >> Can somebody help me understand this scenario . Please assist. >> >> >> >> Thanks, >> >> Siddharth Ubale, >> >> >> > >