Hi Stamatis, The attached one is a more generic proposal: basically moves out target queue resolution to CompactorUtil.getCompactorJobQueueName as it is already in use for the StatsUpdater. There then the old properties are used based on a new fallback config prop. Fallback is only for statement (ci.props) and table (t.props) and not global configuration.
It's just a mock-up, I didn't even check if it compiles, but shows the logic which should be good enough for the discussion. R, Janos Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2022. febr. 7., H, 23:44): > Thanks Janos for the feedback. > > If I understand well your suggestion is support all of the properties below > for table level compactions and treat them as equivalent: > * compactor.mapred.job.queue.name > * compactor.mapreduce.job.queuename > * compactor.hive.compactor.job.queue > > It is something that crossed my mind as well but I am slightly skeptical > because like this we explicitly state that people are free to use whatever > they like. It might also have as a consequence MR properties affecting Tez > (as it happens a bit with HIVE-25595) which from my perspective is not that > great. I am also thinking that it will lead to more requests for accepting > these MR specific properties in the query based compactor which cannot (and > probably will never) use MR as the underlying engine. We should also keep > in mind that the MR engine was deprecated ~6years ago and the MR compactor > may follow soon. > > I am fine implementing this specific change (accepting all properties > above) as long as someone from the people contributing to the compactor > confirms it is the desired path going forward. > > Best, > Stamatis > > On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs <kovja...@gmail.com> wrote: > > > Hi Stamatis, > > > > I agree that the [compactor.]*hive.compactor.queue.name > > <http://hive.compactor.queue.name>* is a better solution as hive now > also > > supports query based compaction, not only MR. > > ...although I think this needs to be backward compatible! > > > > What do you think about a logic similar to this: > > > > --- > a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 2022-02-07 10:31:28.000000000 +0100 > > +++ > b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 2022-02-07 10:33:25.000000000 +0100 > > @@ -145,10 +145,19 @@ > > overrideMRProps(job, t.getParameters()); // override MR properties > from tblproperties if applicable > > if (ci.properties != null) { > > overrideTblProps(job, t.getParameters(), ci.properties); > > } > > > > + // make queue configuration backward compatible > > + // at that point overrideMRProps and OverrideTblProps already > consolidated > > + // the final value, just need to use job.TBALE_PROPS > > + String queueNameLegacy = > > + (new > StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty(" > compactor.mapred.job.queue.name"); > > + if (queueNameLegacy != null && queueNameLegacy.length() > 0) { > > + job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy); > > + } > > + > > String queueName = HiveConf.getVar(job, > ConfVars.COMPACTOR_JOB_QUEUE); > > if (queueName != null && queueName.length() > 0) { > > job.setQueueName(queueName); > > } > > > > > > Of course this can be wrapped around with a new config if needed, like > > hive.compaction.queue.name.use.legacy or whatever... > > FYI: we might also want to check legacy config not only for *" > compactor.mapred.job.queue.name > > <http://compactor.mapred.job.queue.name>"* but also for > > *"compactor.mapreduce.job.queuename" *as the first one was already on the > > deprecated list as pointed out by Peter Vary. > > > > Please also note that the change introduced by HIVE-25595 is currently > not > > compatible with the new config as it was developed for the old > > compactor.mapred... property: > > > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31 > > This also needs to be handled - for both the new prop name and backward > > compatibility. > > > > R, Janos > > > > > > On 2022/01/31 09:50:49 Stamatis Zampetakis wrote: > > > Hi all, > > > > > > This email is an attempt to converge on which Hive/Tez/MR properties > > > someone should use in order to schedule a compaction on specific > queues. > > > For those who are not familiar with how queues are used the YARN > capacity > > > scheduler documentation [1] gives the general idea. > > > > > > Using specific queues for compaction jobs is necessary to be able to > > > efficiently allocate resources for maintenance tasks (compaction) and > > > production workloads. Hive provides various ways to control the queues > > used > > > by the compactor and there have been various tickets with improvements > > and > > > fixes in this area (see list below). > > > > > > The granularity we can select queues for compactions (all tables vs. > per > > > table) currently depends on which compactor is in use (MR vs Query > based) > > > and boils down to the following properties: > > > > > > Global configuration: > > > * hive.compactor.job.queue > > > * mapred.job.queue.name > > > * tez.queue.name > > > > > > Per table/statement configuration (table properties): > > > * compactor.mapred.job.queue.name (before HIVE-20723) > > > * compactor.hive.compactor.job.queue (after HIVE-20723) > > > > > > Things are a bit blurred with respect to what properties someone should > > use > > > to achieve the desired result. Some changes, such as HIVE-20723, raise > > > backward compatibility concerns and other changes seem to have a larger > > > impact than the one specifically designed for. For example, after > > > HIVE-25595, map reduce queue properties can have an impact on the > > compactor > > > queues even when Tez is in use. > > > > > > In order to avoid confusion and ensure long term support of these queue > > > selection features we should clarify which of the above properties > should > > > be used. > > > > > > Given the current situation, I would propose to officially support only > > the > > > following: > > > * hive.compactor.job.queue > > > * compactor.hive.compactor.job.queue > > > and align the implementation based on these (if necessary). In other > > words, > > > Hive users should not use mapred.job.queue.name and tez.queue.name > > > explicitly at least when it comes to the compactor. Hive should set > them > > > transparently (as it happens now in various places) based on > > > [compactor.]hive.compactor.job.queue. > > > > > > What do people think? Are there other ideas? > > > > > > Best, > > > Stamatis > > > > > > [1] > > > > > > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > > > > > > HIVE-11997: Add ability to send Compaction Jobs to specific queue > > > HIVE-13354: Add ability to specify Compaction options per table and per > > > request > > > HIVE-20723: Allow per table specification of compaction yarn queue > > > HIVE-24781: Allow to use custom queue for query based compaction > > > HIVE-25801: Custom queue settings is not honoured by Query based > > compaction > > > StatsUpdater > > > HIVE-25595: Custom queue settings is not honoured by compaction > > StatsUpdater > > > > > >