Hi Stamatis,

The attached one is a more generic proposal: basically moves out target
queue resolution to CompactorUtil.getCompactorJobQueueName as it is already
in use for the StatsUpdater.
There then the old properties are used based on a new fallback config prop.
Fallback is only for statement (ci.props) and table (t.props) and not
global configuration.

It's just a mock-up, I didn't even check if it compiles, but shows the
logic which should be good enough for the discussion.

R, Janos


Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2022. febr. 7.,
H, 23:44):

> Thanks Janos for the feedback.
>
> If I understand well your suggestion is support all of the properties below
> for table level compactions and treat them as equivalent:
> * compactor.mapred.job.queue.name
> * compactor.mapreduce.job.queuename
> * compactor.hive.compactor.job.queue
>
> It is something that crossed my mind as well but I am slightly skeptical
> because like this we explicitly state that people are free to use whatever
> they like. It might also have as a consequence MR properties affecting Tez
> (as it happens a bit with HIVE-25595) which from my perspective is not that
> great. I am also thinking that it will lead to more requests for accepting
> these MR specific properties in the query based compactor which cannot (and
> probably will never) use MR as the underlying engine. We should also keep
> in mind that the MR engine was deprecated ~6years ago and the MR compactor
> may follow soon.
>
> I am fine implementing this specific change (accepting all properties
> above) as long as someone from the people contributing to the compactor
> confirms it is the desired path going forward.
>
> Best,
> Stamatis
>
> On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs <kovja...@gmail.com> wrote:
>
> > Hi Stamatis,
> >
> > I agree that the [compactor.]*hive.compactor.queue.name
> > <http://hive.compactor.queue.name>* is a better solution as hive now
> also
> > supports query based compaction, not only MR.
> > ...although I think this needs to be backward compatible!
> >
> > What do you think about a logic similar to this:
> >
> > ---
> a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:31:28.000000000 +0100
> > +++
> b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:33:25.000000000 +0100
> > @@ -145,10 +145,19 @@
> >      overrideMRProps(job, t.getParameters()); // override MR properties
> from tblproperties if applicable
> >      if (ci.properties != null) {
> >        overrideTblProps(job, t.getParameters(), ci.properties);
> >      }
> >
> > +    // make queue configuration backward compatible
> > +    // at that point overrideMRProps and OverrideTblProps already
> consolidated
> > +    // the final value, just need to use job.TBALE_PROPS
> > +    String queueNameLegacy =
> > +      (new
> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("
> compactor.mapred.job.queue.name");
> > +    if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
> > +      job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
> > +    }
> > +
> >      String queueName = HiveConf.getVar(job,
> ConfVars.COMPACTOR_JOB_QUEUE);
> >      if (queueName != null && queueName.length() > 0) {
> >        job.setQueueName(queueName);
> >      }
> >
> >
> > Of course this can be wrapped around with a new config if needed, like
> > hive.compaction.queue.name.use.legacy or whatever...
> > FYI: we might also want to check legacy config not only for *"
> compactor.mapred.job.queue.name
> > <http://compactor.mapred.job.queue.name>"* but also for
> > *"compactor.mapreduce.job.queuename" *as the first one was already on the
> > deprecated list as pointed out by Peter Vary.
> >
> > Please also note that the change introduced by HIVE-25595 is currently
> not
> > compatible with the new config as it was developed for the old
> > compactor.mapred... property:
> >
> >
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
> > This also needs to be handled - for both the new prop name and backward
> > compatibility.
> >
> > R, Janos
> >
> >
> > On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
> > > Hi all,
> > >
> > > This email is an attempt to converge on which Hive/Tez/MR properties
> > > someone should use in order to schedule a compaction on specific
> queues.
> > > For those who are not familiar with how queues are used the YARN
> capacity
> > > scheduler documentation [1] gives the general idea.
> > >
> > > Using specific queues for compaction jobs is necessary to be able to
> > > efficiently allocate resources for maintenance tasks (compaction) and
> > > production workloads. Hive provides various ways to control the queues
> > used
> > > by the compactor and there have been various tickets with improvements
> > and
> > > fixes in this area (see list below).
> > >
> > > The granularity we can select queues for compactions (all tables vs.
> per
> > > table) currently depends on which compactor is in use (MR vs Query
> based)
> > > and boils down to the following properties:
> > >
> > > Global configuration:
> > > * hive.compactor.job.queue
> > > * mapred.job.queue.name
> > > * tez.queue.name
> > >
> > > Per table/statement configuration (table properties):
> > > * compactor.mapred.job.queue.name (before HIVE-20723)
> > > * compactor.hive.compactor.job.queue (after HIVE-20723)
> > >
> > > Things are a bit blurred with respect to what properties someone should
> > use
> > > to achieve the desired result. Some changes, such as HIVE-20723, raise
> > > backward compatibility concerns and other changes seem to have a larger
> > > impact than the one specifically designed for. For example, after
> > > HIVE-25595, map reduce queue properties can have an impact on the
> > compactor
> > > queues even when Tez is in use.
> > >
> > > In order to avoid confusion and ensure long term support of these queue
> > > selection features we should clarify which of the above properties
> should
> > > be used.
> > >
> > > Given the current situation, I would propose to officially support only
> > the
> > > following:
> > > * hive.compactor.job.queue
> > > * compactor.hive.compactor.job.queue
> > > and align the implementation based on these (if necessary). In other
> > words,
> > > Hive users should not use mapred.job.queue.name and tez.queue.name
> > > explicitly at least when it comes to the compactor. Hive should set
> them
> > > transparently (as it happens now in various places) based on
> > > [compactor.]hive.compactor.job.queue.
> > >
> > > What do people think? Are there other ideas?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> >
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> > >
> > > HIVE-11997: Add ability to send Compaction Jobs to specific queue
> > > HIVE-13354: Add ability to specify Compaction options per table and per
> > > request
> > > HIVE-20723: Allow per table specification of compaction yarn queue
> > > HIVE-24781: Allow to use custom queue for query based compaction
> > > HIVE-25801: Custom queue settings is not honoured by Query based
> > compaction
> > > StatsUpdater
> > > HIVE-25595: Custom queue settings is not honoured by compaction
> > StatsUpdater
> > >
> >
>

Reply via email to