Ayan:
Please read this:
http://hbase.apache.org/book.html#cp

Cheers

On Thu, Sep 3, 2015 at 2:13 PM, ayan guha <guha.a...@gmail.com> wrote:

> Hi
>
> Thanks for your comments. My driving point is instead of loading Hbase
> data entirely I want to process record by record lookup and that is best
> done in UDF or map function. I also would loved to do it in Spark but no
> production cluster yet here :(
>
> @Franke: I do not have enough competency on coprocessors so I am not able
> to visualize the solution as you are suggesting, so it would be really
> helpful if you shed some more light to it?
>
> Best
> Ayan
>
> On Fri, Sep 4, 2015 at 1:44 AM, Tao Lu <taolu2...@gmail.com> wrote:
>
>> But I don't see how it works here with phoenix or hbase coprocessor.
>> Remember we are joining 2 big data sets here, one is the big file in HDFS,
>> and records in HBASE. The driving force comes from Hadoop cluster.
>>
>>
>>
>>
>> On Thu, Sep 3, 2015 at 11:37 AM, Jörn Franke <jornfra...@gmail.com>
>> wrote:
>>
>>> If you use pig or spark you increase the complexity from an operations
>>> management perspective significantly. Spark should be seen from a platform
>>> perspective if it make sense. If you can do it directly with hbase/phoenix
>>> or only hbase coprocessor then this should be preferred. Otherwise you pay
>>> more money for maintenance and development.
>>>
>>> Le jeu. 3 sept. 2015 à 17:16, Tao Lu <taolu2...@gmail.com> a écrit :
>>>
>>>> Yes. Ayan, you approach will work.
>>>>
>>>> Or alternatively, use Spark, and write a Scala/Java function which
>>>> implements similar logic in your Pig UDF.
>>>>
>>>> Both approaches look similar.
>>>>
>>>> Personally, I would go with Spark solution, it will be slightly faster,
>>>> and easier if you already have Spark cluster setup on top of your hadoop
>>>> cluster in your infrastructure.
>>>>
>>>> Cheers,
>>>> Tao
>>>>
>>>>
>>>> On Thu, Sep 3, 2015 at 1:15 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>>
>>>>> Thanks for your info. I am planning to implement a pig udf to do
>>>>> record look ups. Kindly let me know if this is a good idea.
>>>>>
>>>>> Best
>>>>> Ayan
>>>>>
>>>>> On Thu, Sep 3, 2015 at 2:55 PM, Jörn Franke <jornfra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> You may check if it makes sense to write a coprocessor doing an
>>>>>> upsert for you, if it does not exist already. Maybe phoenix for Hbase
>>>>>> supports this already.
>>>>>>
>>>>>> Another alternative, if the records do not have an unique Id, is to
>>>>>> put them into a text index engine, such as Solr or Elasticsearch, which
>>>>>> does in this case a fast matching with relevancy scores.
>>>>>>
>>>>>>
>>>>>> You can use also Spark and Pig there. However, I am not sure if Spark
>>>>>> is suitable for these one row lookups. Same holds for Pig.
>>>>>>
>>>>>>
>>>>>> Le mer. 2 sept. 2015 à 23:53, ayan guha <guha.a...@gmail.com> a
>>>>>> écrit :
>>>>>>
>>>>>> Hello group
>>>>>>
>>>>>> I am trying to use pig or spark in order to achieve following:
>>>>>>
>>>>>> 1. Write a batch process which will read from a file
>>>>>> 2. Lookup hbase to see if the record exists. If so then need to
>>>>>> compare incoming values with hbase and update fields which do not match.
>>>>>> Else create a new record.
>>>>>>
>>>>>> My questions:
>>>>>> 1. Is this a good use case for pig or spark?
>>>>>> 2. Is there any way to read hbase for each incoming record in pig
>>>>>> without writing map reduce code?
>>>>>> 3. In case of spark I think we have to connect to hbase for every
>>>>>> record. Is thr any other way?
>>>>>> 4. What is the best connector for hbase which gives this
>>>>>> functionality?
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Ayan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ayan Guha
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------------------------------------------------
>>>> Thanks!
>>>> Tao
>>>>
>>>
>>
>>
>> --
>> ------------------------------------------------
>> Thanks!
>> Tao
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Reply via email to