Re: Hadoop and Hibernate

Tarjei Huse Fri, 02 Mar 2012 23:25:30 -0800

On 03/02/2012 07:59 PM, Geoffry Roberts wrote:
> Queries are nothing but inserts.  Create an object, populated it, persist
> it. If it worked, life would be good right now.
>
> I've considered JDBC and may yet take that approach.
I used Mybatis on a project now - also worth considering if you want a
more orm like feel to the job.
>
> re: Hibernate outside of Spring -- I'm getting tired already.
>
> Interesting thing:  I use EMF (Eclipse Modelling Framework).  The
> supporting jar files for emf and ecore are built into the job.  They are
> being found by the Driver(s) and the MR(s) no problemo.  If these work, why
> not the hibernate stuff?  Mystery!
I wish I knew. :)



T
>
> On 2 March 2012 10:50, Tarjei Huse <tar...@scanmine.com> wrote:
>
>> On 03/02/2012 07:31 PM, Geoffry Roberts wrote:
>>> No, I am using 0.21.0 for better performance.  I am interested in
>>> DistributedCache so certain libraries can be found during MR processing.
>>> As it is now, I'm getting ClassNotFoundException being thrown by the
>>> Reducers.  The Driver throws no error, the Reducer(s) does.  It would
>> seem
>>> something is not being distributed across the cluster as I assumed it
>>> would.  After all, the whole business is in a single, executable jar
>> file.
>>
>> How complex are the queries you are doing?
>>
>> Have you considered one of the following:
>>
>> 1) Use plain jdbc instead of integrating Hibernate into Hadoop.
>> 2) Create a local version of the db that can be in the Distributed Cache.
>>
>> I tried using Hibernate with hadoop (the queries were not an important
>> part of the size of the jobs) but I ran up against so many issues trying
>> to get Hibernate to start up within the MR job that i ended up just
>> exporting the tables, loading them into memory and doing queries against
>> them with basic HashMap lookups.
>>
>> My best advice is that if you can, you should consider a way to abstract
>> away Hibernate from the job and use something closer to the metal like
>> either JDBC or just dump the data to files. Getting Hibernate to run
>> outside of Spring and friends can quickly grow tiresome.
>>
>> T
>>> On 2 March 2012 09:46, Kunaal <kunaalbha...@gmail.com> wrote:
>>>
>>>> Are you looking to use DistributedCache for better performance?
>>>>
>>>> On Fri, Mar 2, 2012 at 9:42 AM, Geoffry Roberts
>>>> <geoffry.robe...@gmail.com>wrote:
>>>>
>>>>> This is a tardy response.  I'm spread pretty thinly right now.
>>>>>
>>>>> DistributedCache<
>>>>>
>> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
>>>>>> is
>>>>> apparently deprecated.  Is there a replacement?  I didn't see anything
>>>>> about this in the documentation, but then I am still using 0.21.0. I
>> have
>>>>> to for performance reasons.  1.0.1 is too slow and the client won't
>> have
>>>>> it.
>>>>>
>>>>> Also, the DistributedCache<
>>>>>
>> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
>>>>>> approach
>>>>> seems only to work from within a hadoop job.  i.e. From within a
>>>>> Mapper or a Reducer, but not from within a Driver.  I have libraries
>>>> that I
>>>>> must access both from both places.  I take it that I am stuck keeping
>> two
>>>>> copies of these libraries in synch--Correct?  It's either that, or copy
>>>>> them into hdfs, replacing them all at the beginning of each job run.
>>>>>
>>>>> Looking for best practices.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On 28 February 2012 10:17, Owen O'Malley <omal...@apache.org> wrote:
>>>>>
>>>>>> On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts
>>>>>> <geoffry.robe...@gmail.com> wrote:
>>>>>>
>>>>>>> If I create an executable jar file that contains all dependencies
>>>>>> required
>>>>>>> by the MR job do all said dependencies get distributed to all nodes?
>>>>>> You can make a single jar and that will be distributed to all of the
>>>>>> machines that run the task, but it is better in most cases to use the
>>>>>> distributed cache.
>>>>>>
>>>>>> See
>>>>>>
>> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
>>>>>>> If I specify but one reducer, which node in the cluster will the
>>>>> reducer
>>>>>>> run on?
>>>>>> The scheduling is done by the JobTracker and it isn't possible to
>>>>>> control the location of the reducers.
>>>>>>
>>>>>> -- Owen
>>>>>>
>>>>>
>>>>> --
>>>>> Geoffry Roberts
>>>>>
>>>>
>>>> --
>>>> "What we are is the universe's gift to us.
>>>> What we become is our gift to the universe."
>>>>
>>>
>>
>> --
>> Regards / Med vennlig hilsen
>> Tarjei Huse
>> Mobil: 920 63 413
>>
>>
>


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413

Re: Hadoop and Hibernate

Reply via email to