And if I may ask, how long it takes in hbase CLI? I would not expect spark
to  improve performance of hbase. At best spark will push down the filter
to hbase. So I would try to optimise any additional overhead like bringing
data into spark.
On 1 May 2015 00:56, "Ted Yu" <yuzhih...@gmail.com> wrote:

> bq. a single query on one filter criteria
>
> Can you tell us more about your filter ? How selective is it ?
>
> Which hbase release are you using ?
>
> Cheers
>
> On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale <
> siddharth.ub...@syncoms.com> wrote:
>
>>  Hi,
>>
>>
>>
>> I want to use Spark as Query engine on HBase with sub second latency.
>>
>>
>>
>> I am  using Spark 1.3  version. And followed the steps below on Hbase
>> table with around 3.5 lac rows :
>>
>>
>>
>> *1.       *Mapped the Dataframe to Hbase table .RDDCustomers maps to the
>> hbase table which is used to create the Dataframe.
>>
>> *ā€œ DataFrame schemaCustomers = sqlInstance*
>>
>> *
>> .createDataFrame(SparkContextImpl.getRddCustomers(),*
>>
>> *
>> Customers.class);ā€ *
>>
>> 2.       Used registertemp table i.eā€
>> *schemaCustomers.registerTempTable("customers");ā€*
>>
>> 3.       Running the query on Dataframe using Sqlcontext Instance.
>>
>>
>>
>> What I am observing is that for a single query on one filter criteria the
>> query is taking 7-8 seconds? And the time increases as I am increasing the
>> number of rows in Hbase table. Also, there was one time when I was getting
>> query response under 1-2 seconds. Seems like strange behavior.
>>
>> Is this expected behavior from Spark or am I missing something here?
>>
>> Can somebody help me understand this scenario . Please assist.
>>
>>
>>
>> Thanks,
>>
>> Siddharth Ubale,
>>
>>
>>
>
>

Reply via email to