No, more of an anti-pattern really. As we moved our distributed stuff from MR to spark we moved away from 1 "tool" = 1 "job" = "does one thing" and towards computation on a set of RDDs (or dataframes) pulled from Accumulo. As Christopher suggested, the right thing to do in that case is to have short-lived, narrowly-scoped Job objects, one per RDD. Prior to figuring that out, we struggled for a little while with how to re-use a Job object :)
IIRC Christopher discussed a 2.0 client API that used the builder pattern pretty extensively for client configuration. I think that's a very user-friendly approach, we've done that today with some builder classes in Scala around AbstractInputFormat and InputFormatBase and it makes the code very readable. -Russ On Thu, Aug 18, 2016 at 8:36 AM Josh Elser <[email protected]> wrote: > bq. Note that the method isn't idempotent. To clear the iterators on a > job you can call > job.getConfiguration.unset("AccumuloInputFormat.ScanOpts.Iterators") > (but that isn't officially part of the public API) > > Hey Russ -- do you have a use case behind why you know the above trick? > Is there a reason that we might want to formally support removal of > iterators from a Job's Configuration? > > Christopher wrote: > > I'm not sure it's really necessary to add the ability to clear. The way I > > see it, you create the configuration, then execute, create a new > > configuration, then execute, etc. I'm not sure what the use case is for > > clearing.... if you didn't want it there, why did you add it only to > remove > > it? But it seems to me that clearing indicates a potential misuse, or at > > least, anti-best-practices. Granted, I don't do a lot of MapReduce these > > days, maybe I'm missing some significant use case. > > > > Russ mentioned the method isn't idempotent. That's true. Additionally, > the > > Scanner will throw an exception if the name and/or priority is already in > > use. We don't do any up-front deduplication in the job config. > > > > On Wed, Aug 17, 2016 at 5:45 PM Josh Elser<[email protected]> wrote: > > > >> Sounds like something we should be addressing (the ability to clear > >> iterators from a Job's configuration)... > >> > >> -------- Original Message -------- > >> Subject: Re: Accumulo 1.7 InputFormat Iterator Question > >> Date: Wed, 17 Aug 2016 21:31:01 +0000 > >> From: Russ Weeks<[email protected]> > >> Reply-To: [email protected] > >> To: [email protected] > >> > >> > >> > >> Hi, Jamie, > >> > >> Try the static method AccumuloInputFormat.addIterator(job, new > >> IteratorSetting(...)). > >> > >> Note that the method isn't idempotent. To clear the iterators on a job > >> you can > >> call > >> job.getConfiguration.unset("AccumuloInputFormat.ScanOpts.Iterators") > (but > >> that isn't officially part of the public API) > >> > >> -Russ > >> > >> On Wed, Aug 17, 2016 at 2:26 PM Jamie Johnson<[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> I am upgrading from Accumulo 1.6 to 1.7 and I am trying to > >> understand how iterators are supposed to be set in 1.7 for an > input > >> format. In my situation, if a particular property is set an > >> additional iterator needs to be added to do some additional > >> checking. Previously I had done this in the > >> AbstractRecordReader.setupIterators() method but this has been > >> deprecated. I had attempted to put them in > >> AbstractRecordReader.contextIterators(), but this isn't always > >> called. This change has made me question if I was ever doing this > >> according to best practices and now wonder what the correct way to > >> do this is. Any pointers would be greatly appreciated. > >> > >> > > >
