Re: Accumulo 1.7 InputFormat Iterator Question

Russ Weeks Thu, 18 Aug 2016 09:39:15 -0700

No, more of an anti-pattern really. As we moved our distributed stuff from
MR to spark we moved away from 1 "tool" = 1 "job" = "does one thing" and
towards computation on a set of RDDs (or dataframes) pulled from Accumulo.
As Christopher suggested, the right thing to do in that case is to have
short-lived, narrowly-scoped Job objects, one per RDD. Prior to figuring
that out, we struggled for a little while with how to re-use a Job object :)


IIRC Christopher discussed a 2.0 client API that used the builder pattern
pretty extensively for client configuration. I think that's a very
user-friendly approach, we've done that today with some builder classes in
Scala around AbstractInputFormat and InputFormatBase and it makes the code
very readable.

-Russ

On Thu, Aug 18, 2016 at 8:36 AM Josh Elser <[email protected]> wrote:

> bq. Note that the method isn't idempotent. To clear the iterators on a
> job you can call
> job.getConfiguration.unset("AccumuloInputFormat.ScanOpts.Iterators")
> (but that isn't officially part of the public API)
>
> Hey Russ -- do you have a use case behind why you know the above trick?
> Is there a reason that we might want to formally support removal of
> iterators from a Job's Configuration?
>
> Christopher wrote:
> > I'm not sure it's really necessary to add the ability to clear. The way I
> > see it, you create the configuration, then execute, create a new
> > configuration, then execute, etc. I'm not sure what the use case is for
> > clearing.... if you didn't want it there, why did you add it only to
> remove
> > it? But it seems to me that clearing indicates a potential misuse, or at
> > least, anti-best-practices. Granted, I don't do a lot of MapReduce these
> > days, maybe I'm missing some significant use case.
> >
> > Russ mentioned the method isn't idempotent. That's true. Additionally,
> the
> > Scanner will throw an exception if the name and/or priority is already in
> > use. We don't do any up-front deduplication in the job config.
> >
> > On Wed, Aug 17, 2016 at 5:45 PM Josh Elser<[email protected]>  wrote:
> >
> >> Sounds like something we should be addressing (the ability to clear
> >> iterators from a Job's configuration)...
> >>
> >> -------- Original Message --------
> >> Subject:        Re: Accumulo 1.7 InputFormat Iterator Question
> >> Date:   Wed, 17 Aug 2016 21:31:01 +0000
> >> From:   Russ Weeks<[email protected]>
> >> Reply-To:       [email protected]
> >> To:     [email protected]
> >>
> >>
> >>
> >> Hi, Jamie,
> >>
> >> Try the static method AccumuloInputFormat.addIterator(job, new
> >> IteratorSetting(...)).
> >>
> >> Note that the method isn't idempotent. To clear the iterators on a job
> >> you can
> >> call
> >> job.getConfiguration.unset("AccumuloInputFormat.ScanOpts.Iterators")
> (but
> >> that isn't officially part of the public API)
> >>
> >> -Russ
> >>
> >> On Wed, Aug 17, 2016 at 2:26 PM Jamie Johnson<[email protected]
> >> <mailto:[email protected]>>  wrote:
> >>
> >>       I am upgrading from Accumulo 1.6 to 1.7 and I am trying to
> >>       understand how iterators are supposed to be set in 1.7 for an
> input
> >>       format.  In my situation, if a particular property is set an
> >>       additional iterator needs to be added to do some additional
> >>       checking.  Previously I had done this in the
> >>       AbstractRecordReader.setupIterators() method but this has been
> >>       deprecated.  I had attempted to put them in
> >>       AbstractRecordReader.contextIterators(), but this isn't always
> >>       called.  This change has made me question if I was ever doing this
> >>       according to best practices and now wonder what the correct way to
> >>       do this is.  Any pointers would be greatly appreciated.
> >>
> >>
> >
>

Re: Accumulo 1.7 InputFormat Iterator Question

Reply via email to