[
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=712681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712681
]
ASF GitHub Bot logged work on HIVE-25842:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Jan/22 08:38
Start Date: 21/Jan/22 08:38
Worklog Time Spent: 10m
Work Description: klcopp commented on a change in pull request #2916:
URL: https://github.com/apache/hive/pull/2916#discussion_r789454143
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##########
@@ -139,157 +92,37 @@ public static DeltaFilesMetricReporter getInstance() {
return InstanceHolder.instance;
}
- public static synchronized void init(HiveConf conf) throws Exception {
- getInstance().configure(conf);
+ public static synchronized void init(Configuration conf, TxnStore
txnHandler) throws Exception {
+ if (!initialized) {
+ getInstance().configure(conf, txnHandler);
+ initialized = true;
+ }
}
- private void configure(HiveConf conf) throws Exception {
+ private void configure(Configuration conf, TxnStore txnHandler) throws
Exception {
long reportingInterval =
- HiveConf.getTimeVar(conf,
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
- hiveEntitySeparator = conf.getVar(HiveConf.ConfVars.HIVE_ENTITY_SEPARATOR);
+ MetastoreConf.getTimeVar(conf,
MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_REPORTING_INTERVAL,
TimeUnit.SECONDS);
+
+ maxCacheSize = MetastoreConf.getIntVar(conf,
MetastoreConf.ConfVars.METASTORE_DELTAMETRICS_MAX_CACHE_SIZE);
- initCachesForMetrics(conf);
initObjectsForMetrics();
ThreadFactory threadFactory =
new
ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter
%d").build();
- executorService =
Executors.newSingleThreadScheduledExecutor(threadFactory);
- executorService.scheduleAtFixedRate(new ReportingTask(), 0,
reportingInterval, TimeUnit.SECONDS);
+ reporterExecutorService =
Executors.newSingleThreadScheduledExecutor(threadFactory);
+ reporterExecutorService.scheduleAtFixedRate(new ReportingTask(txnHandler),
0, reportingInterval, TimeUnit.SECONDS);
LOG.info("Started DeltaFilesMetricReporter thread");
Review comment:
Never mind, I had reading problems :D
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 712681)
Time Spent: 6.5h (was: 6h 20m)
> Reimplement delta file metric collection
> ----------------------------------------
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
> Issue Type: Improvement
> Reporter: László Pintér
> Assignee: László Pintér
> Priority: Major
> Labels: pull-request-available
> Time Spent: 6.5h
> Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so
> users will probably see "issues" with compaction (like many active or
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch
> around each method in DeltaFilesMetricsReporter but of course this isn't
> foolproof. This is a HUGE performance and functionality liability. Tests
> caught some issues, but our tests aren't perfect.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)