Hi all, @Janos: The patch didn't go through but I get the idea. FYI: in most apache lists attachments are not allowed
Since people do not have strong feelings about this I am inclined to move forward with Janos suggestion and accept all three properties for specifying compactions. I just logged HIVE-25947 about this. Best, Stamatis On Tue, Feb 8, 2022 at 2:45 PM Janos Kovacs <kovja...@gmail.com> wrote: > Hi Stamatis, > > The attached one is a more generic proposal: basically moves out target > queue resolution to CompactorUtil.getCompactorJobQueueName as it is > already in use for the StatsUpdater. > There then the old properties are used based on a new fallback config > prop. Fallback is only for statement (ci.props) and table (t.props) and not > global configuration. > > It's just a mock-up, I didn't even check if it compiles, but shows the > logic which should be good enough for the discussion. > > R, Janos > > > Stamatis Zampetakis <zabe...@gmail.com> ezt írta (időpont: 2022. febr. > 7., H, 23:44): > >> Thanks Janos for the feedback. >> >> If I understand well your suggestion is support all of the properties >> below >> for table level compactions and treat them as equivalent: >> * compactor.mapred.job.queue.name >> * compactor.mapreduce.job.queuename >> * compactor.hive.compactor.job.queue >> >> It is something that crossed my mind as well but I am slightly skeptical >> because like this we explicitly state that people are free to use whatever >> they like. It might also have as a consequence MR properties affecting Tez >> (as it happens a bit with HIVE-25595) which from my perspective is not >> that >> great. I am also thinking that it will lead to more requests for accepting >> these MR specific properties in the query based compactor which cannot >> (and >> probably will never) use MR as the underlying engine. We should also keep >> in mind that the MR engine was deprecated ~6years ago and the MR compactor >> may follow soon. >> >> I am fine implementing this specific change (accepting all properties >> above) as long as someone from the people contributing to the compactor >> confirms it is the desired path going forward. >> >> Best, >> Stamatis >> >> On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs <kovja...@gmail.com> wrote: >> >> > Hi Stamatis, >> > >> > I agree that the [compactor.]*hive.compactor.queue.name >> > <http://hive.compactor.queue.name>* is a better solution as hive now >> also >> > supports query based compaction, not only MR. >> > ...although I think this needs to be backward compatible! >> > >> > What do you think about a logic similar to this: >> > >> > --- >> a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java >> 2022-02-07 10:31:28.000000000 +0100 >> > +++ >> b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java >> 2022-02-07 10:33:25.000000000 +0100 >> > @@ -145,10 +145,19 @@ >> > overrideMRProps(job, t.getParameters()); // override MR properties >> from tblproperties if applicable >> > if (ci.properties != null) { >> > overrideTblProps(job, t.getParameters(), ci.properties); >> > } >> > >> > + // make queue configuration backward compatible >> > + // at that point overrideMRProps and OverrideTblProps already >> consolidated >> > + // the final value, just need to use job.TBALE_PROPS >> > + String queueNameLegacy = >> > + (new >> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty(" >> compactor.mapred.job.queue.name"); >> > + if (queueNameLegacy != null && queueNameLegacy.length() > 0) { >> > + job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy); >> > + } >> > + >> > String queueName = HiveConf.getVar(job, >> ConfVars.COMPACTOR_JOB_QUEUE); >> > if (queueName != null && queueName.length() > 0) { >> > job.setQueueName(queueName); >> > } >> > >> > >> > Of course this can be wrapped around with a new config if needed, like >> > hive.compaction.queue.name.use.legacy or whatever... >> > FYI: we might also want to check legacy config not only for *" >> compactor.mapred.job.queue.name >> > <http://compactor.mapred.job.queue.name>"* but also for >> > *"compactor.mapreduce.job.queuename" *as the first one was already on >> the >> > deprecated list as pointed out by Peter Vary. >> > >> > Please also note that the change introduced by HIVE-25595 is currently >> not >> > compatible with the new config as it was developed for the old >> > compactor.mapred... property: >> > >> > >> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31 >> > This also needs to be handled - for both the new prop name and backward >> > compatibility. >> > >> > R, Janos >> > >> > >> > On 2022/01/31 09:50:49 Stamatis Zampetakis wrote: >> > > Hi all, >> > > >> > > This email is an attempt to converge on which Hive/Tez/MR properties >> > > someone should use in order to schedule a compaction on specific >> queues. >> > > For those who are not familiar with how queues are used the YARN >> capacity >> > > scheduler documentation [1] gives the general idea. >> > > >> > > Using specific queues for compaction jobs is necessary to be able to >> > > efficiently allocate resources for maintenance tasks (compaction) and >> > > production workloads. Hive provides various ways to control the queues >> > used >> > > by the compactor and there have been various tickets with improvements >> > and >> > > fixes in this area (see list below). >> > > >> > > The granularity we can select queues for compactions (all tables vs. >> per >> > > table) currently depends on which compactor is in use (MR vs Query >> based) >> > > and boils down to the following properties: >> > > >> > > Global configuration: >> > > * hive.compactor.job.queue >> > > * mapred.job.queue.name >> > > * tez.queue.name >> > > >> > > Per table/statement configuration (table properties): >> > > * compactor.mapred.job.queue.name (before HIVE-20723) >> > > * compactor.hive.compactor.job.queue (after HIVE-20723) >> > > >> > > Things are a bit blurred with respect to what properties someone >> should >> > use >> > > to achieve the desired result. Some changes, such as HIVE-20723, raise >> > > backward compatibility concerns and other changes seem to have a >> larger >> > > impact than the one specifically designed for. For example, after >> > > HIVE-25595, map reduce queue properties can have an impact on the >> > compactor >> > > queues even when Tez is in use. >> > > >> > > In order to avoid confusion and ensure long term support of these >> queue >> > > selection features we should clarify which of the above properties >> should >> > > be used. >> > > >> > > Given the current situation, I would propose to officially support >> only >> > the >> > > following: >> > > * hive.compactor.job.queue >> > > * compactor.hive.compactor.job.queue >> > > and align the implementation based on these (if necessary). In other >> > words, >> > > Hive users should not use mapred.job.queue.name and tez.queue.name >> > > explicitly at least when it comes to the compactor. Hive should set >> them >> > > transparently (as it happens now in various places) based on >> > > [compactor.]hive.compactor.job.queue. >> > > >> > > What do people think? Are there other ideas? >> > > >> > > Best, >> > > Stamatis >> > > >> > > [1] >> > > >> > >> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html >> > > >> > > HIVE-11997: Add ability to send Compaction Jobs to specific queue >> > > HIVE-13354: Add ability to specify Compaction options per table and >> per >> > > request >> > > HIVE-20723: Allow per table specification of compaction yarn queue >> > > HIVE-24781: Allow to use custom queue for query based compaction >> > > HIVE-25801: Custom queue settings is not honoured by Query based >> > compaction >> > > StatsUpdater >> > > HIVE-25595: Custom queue settings is not honoured by compaction >> > StatsUpdater >> > > >> > >> >