Hi all,

@Janos: The patch didn't go through but I get the idea. FYI: in most apache
lists attachments are not allowed

Since people do not have strong feelings about this I am inclined to move
forward with Janos suggestion and accept all three properties for
specifying compactions.

I just logged HIVE-25947 about this.

Best,
Stamatis

On Tue, Feb 8, 2022 at 2:45 PM Janos Kovacs <kovja...@gmail.com> wrote:

> Hi Stamatis,
>
> The attached one is a more generic proposal: basically moves out target
> queue resolution to CompactorUtil.getCompactorJobQueueName as it is
> already in use for the StatsUpdater.
> There then the old properties are used based on a new fallback config
> prop. Fallback is only for statement (ci.props) and table (t.props) and not
> global configuration.
>
> It's just a mock-up, I didn't even check if it compiles, but shows the
> logic which should be good enough for the discussion.
>
> R, Janos
>
>
> Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2022. febr.
> 7., H, 23:44):
>
>> Thanks Janos for the feedback.
>>
>> If I understand well your suggestion is support all of the properties
>> below
>> for table level compactions and treat them as equivalent:
>> * compactor.mapred.job.queue.name
>> * compactor.mapreduce.job.queuename
>> * compactor.hive.compactor.job.queue
>>
>> It is something that crossed my mind as well but I am slightly skeptical
>> because like this we explicitly state that people are free to use whatever
>> they like. It might also have as a consequence MR properties affecting Tez
>> (as it happens a bit with HIVE-25595) which from my perspective is not
>> that
>> great. I am also thinking that it will lead to more requests for accepting
>> these MR specific properties in the query based compactor which cannot
>> (and
>> probably will never) use MR as the underlying engine. We should also keep
>> in mind that the MR engine was deprecated ~6years ago and the MR compactor
>> may follow soon.
>>
>> I am fine implementing this specific change (accepting all properties
>> above) as long as someone from the people contributing to the compactor
>> confirms it is the desired path going forward.
>>
>> Best,
>> Stamatis
>>
>> On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs <kovja...@gmail.com> wrote:
>>
>> > Hi Stamatis,
>> >
>> > I agree that the [compactor.]*hive.compactor.queue.name
>> > <http://hive.compactor.queue.name>* is a better solution as hive now
>> also
>> > supports query based compaction, not only MR.
>> > ...although I think this needs to be backward compatible!
>> >
>> > What do you think about a logic similar to this:
>> >
>> > ---
>> a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
>> 2022-02-07 10:31:28.000000000 +0100
>> > +++
>> b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
>> 2022-02-07 10:33:25.000000000 +0100
>> > @@ -145,10 +145,19 @@
>> >      overrideMRProps(job, t.getParameters()); // override MR properties
>> from tblproperties if applicable
>> >      if (ci.properties != null) {
>> >        overrideTblProps(job, t.getParameters(), ci.properties);
>> >      }
>> >
>> > +    // make queue configuration backward compatible
>> > +    // at that point overrideMRProps and OverrideTblProps already
>> consolidated
>> > +    // the final value, just need to use job.TBALE_PROPS
>> > +    String queueNameLegacy =
>> > +      (new
>> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("
>> compactor.mapred.job.queue.name");
>> > +    if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
>> > +      job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
>> > +    }
>> > +
>> >      String queueName = HiveConf.getVar(job,
>> ConfVars.COMPACTOR_JOB_QUEUE);
>> >      if (queueName != null && queueName.length() > 0) {
>> >        job.setQueueName(queueName);
>> >      }
>> >
>> >
>> > Of course this can be wrapped around with a new config if needed, like
>> > hive.compaction.queue.name.use.legacy or whatever...
>> > FYI: we might also want to check legacy config not only for *"
>> compactor.mapred.job.queue.name
>> > <http://compactor.mapred.job.queue.name>"* but also for
>> > *"compactor.mapreduce.job.queuename" *as the first one was already on
>> the
>> > deprecated list as pointed out by Peter Vary.
>> >
>> > Please also note that the change introduced by HIVE-25595 is currently
>> not
>> > compatible with the new config as it was developed for the old
>> > compactor.mapred... property:
>> >
>> >
>> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
>> > This also needs to be handled - for both the new prop name and backward
>> > compatibility.
>> >
>> > R, Janos
>> >
>> >
>> > On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
>> > > Hi all,
>> > >
>> > > This email is an attempt to converge on which Hive/Tez/MR properties
>> > > someone should use in order to schedule a compaction on specific
>> queues.
>> > > For those who are not familiar with how queues are used the YARN
>> capacity
>> > > scheduler documentation [1] gives the general idea.
>> > >
>> > > Using specific queues for compaction jobs is necessary to be able to
>> > > efficiently allocate resources for maintenance tasks (compaction) and
>> > > production workloads. Hive provides various ways to control the queues
>> > used
>> > > by the compactor and there have been various tickets with improvements
>> > and
>> > > fixes in this area (see list below).
>> > >
>> > > The granularity we can select queues for compactions (all tables vs.
>> per
>> > > table) currently depends on which compactor is in use (MR vs Query
>> based)
>> > > and boils down to the following properties:
>> > >
>> > > Global configuration:
>> > > * hive.compactor.job.queue
>> > > * mapred.job.queue.name
>> > > * tez.queue.name
>> > >
>> > > Per table/statement configuration (table properties):
>> > > * compactor.mapred.job.queue.name (before HIVE-20723)
>> > > * compactor.hive.compactor.job.queue (after HIVE-20723)
>> > >
>> > > Things are a bit blurred with respect to what properties someone
>> should
>> > use
>> > > to achieve the desired result. Some changes, such as HIVE-20723, raise
>> > > backward compatibility concerns and other changes seem to have a
>> larger
>> > > impact than the one specifically designed for. For example, after
>> > > HIVE-25595, map reduce queue properties can have an impact on the
>> > compactor
>> > > queues even when Tez is in use.
>> > >
>> > > In order to avoid confusion and ensure long term support of these
>> queue
>> > > selection features we should clarify which of the above properties
>> should
>> > > be used.
>> > >
>> > > Given the current situation, I would propose to officially support
>> only
>> > the
>> > > following:
>> > > * hive.compactor.job.queue
>> > > * compactor.hive.compactor.job.queue
>> > > and align the implementation based on these (if necessary). In other
>> > words,
>> > > Hive users should not use mapred.job.queue.name and tez.queue.name
>> > > explicitly at least when it comes to the compactor. Hive should set
>> them
>> > > transparently (as it happens now in various places) based on
>> > > [compactor.]hive.compactor.job.queue.
>> > >
>> > > What do people think? Are there other ideas?
>> > >
>> > > Best,
>> > > Stamatis
>> > >
>> > > [1]
>> > >
>> >
>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>> > >
>> > > HIVE-11997: Add ability to send Compaction Jobs to specific queue
>> > > HIVE-13354: Add ability to specify Compaction options per table and
>> per
>> > > request
>> > > HIVE-20723: Allow per table specification of compaction yarn queue
>> > > HIVE-24781: Allow to use custom queue for query based compaction
>> > > HIVE-25801: Custom queue settings is not honoured by Query based
>> > compaction
>> > > StatsUpdater
>> > > HIVE-25595: Custom queue settings is not honoured by compaction
>> > StatsUpdater
>> > >
>> >
>>
>

Reply via email to