On 03/02/2012 07:59 PM, Geoffry Roberts wrote: > Queries are nothing but inserts. Create an object, populated it, persist > it. If it worked, life would be good right now. > > I've considered JDBC and may yet take that approach. I used Mybatis on a project now - also worth considering if you want a more orm like feel to the job. > > re: Hibernate outside of Spring -- I'm getting tired already. > > Interesting thing: I use EMF (Eclipse Modelling Framework). The > supporting jar files for emf and ecore are built into the job. They are > being found by the Driver(s) and the MR(s) no problemo. If these work, why > not the hibernate stuff? Mystery! I wish I knew. :)
T > > On 2 March 2012 10:50, Tarjei Huse <tar...@scanmine.com> wrote: > >> On 03/02/2012 07:31 PM, Geoffry Roberts wrote: >>> No, I am using 0.21.0 for better performance. I am interested in >>> DistributedCache so certain libraries can be found during MR processing. >>> As it is now, I'm getting ClassNotFoundException being thrown by the >>> Reducers. The Driver throws no error, the Reducer(s) does. It would >> seem >>> something is not being distributed across the cluster as I assumed it >>> would. After all, the whole business is in a single, executable jar >> file. >> >> How complex are the queries you are doing? >> >> Have you considered one of the following: >> >> 1) Use plain jdbc instead of integrating Hibernate into Hadoop. >> 2) Create a local version of the db that can be in the Distributed Cache. >> >> I tried using Hibernate with hadoop (the queries were not an important >> part of the size of the jobs) but I ran up against so many issues trying >> to get Hibernate to start up within the MR job that i ended up just >> exporting the tables, loading them into memory and doing queries against >> them with basic HashMap lookups. >> >> My best advice is that if you can, you should consider a way to abstract >> away Hibernate from the job and use something closer to the metal like >> either JDBC or just dump the data to files. Getting Hibernate to run >> outside of Spring and friends can quickly grow tiresome. >> >> T >>> On 2 March 2012 09:46, Kunaal <kunaalbha...@gmail.com> wrote: >>> >>>> Are you looking to use DistributedCache for better performance? >>>> >>>> On Fri, Mar 2, 2012 at 9:42 AM, Geoffry Roberts >>>> <geoffry.robe...@gmail.com>wrote: >>>> >>>>> This is a tardy response. I'm spread pretty thinly right now. >>>>> >>>>> DistributedCache< >>>>> >> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache >>>>>> is >>>>> apparently deprecated. Is there a replacement? I didn't see anything >>>>> about this in the documentation, but then I am still using 0.21.0. I >> have >>>>> to for performance reasons. 1.0.1 is too slow and the client won't >> have >>>>> it. >>>>> >>>>> Also, the DistributedCache< >>>>> >> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache >>>>>> approach >>>>> seems only to work from within a hadoop job. i.e. From within a >>>>> Mapper or a Reducer, but not from within a Driver. I have libraries >>>> that I >>>>> must access both from both places. I take it that I am stuck keeping >> two >>>>> copies of these libraries in synch--Correct? It's either that, or copy >>>>> them into hdfs, replacing them all at the beginning of each job run. >>>>> >>>>> Looking for best practices. >>>>> >>>>> Thanks >>>>> >>>>> On 28 February 2012 10:17, Owen O'Malley <omal...@apache.org> wrote: >>>>> >>>>>> On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts >>>>>> <geoffry.robe...@gmail.com> wrote: >>>>>> >>>>>>> If I create an executable jar file that contains all dependencies >>>>>> required >>>>>>> by the MR job do all said dependencies get distributed to all nodes? >>>>>> You can make a single jar and that will be distributed to all of the >>>>>> machines that run the task, but it is better in most cases to use the >>>>>> distributed cache. >>>>>> >>>>>> See >>>>>> >> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache >>>>>>> If I specify but one reducer, which node in the cluster will the >>>>> reducer >>>>>>> run on? >>>>>> The scheduling is done by the JobTracker and it isn't possible to >>>>>> control the location of the reducers. >>>>>> >>>>>> -- Owen >>>>>> >>>>> >>>>> -- >>>>> Geoffry Roberts >>>>> >>>> >>>> -- >>>> "What we are is the universe's gift to us. >>>> What we become is our gift to the universe." >>>> >>> >> >> -- >> Regards / Med vennlig hilsen >> Tarjei Huse >> Mobil: 920 63 413 >> >> > -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413