[ https://issues.apache.org/jira/browse/HIVE-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529341#comment-16529341 ]
ASF GitHub Bot commented on HIVE-20025: --------------------------------------- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/384 HIVE-20025: Clean-up of event files created by HiveProtoLoggingHook. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-20025 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #384 ---- commit 52c24baa28ed305f3be2b47f6246ffede0f08e6e Author: Sankar Hariappan <mailtosankarh@...> Date: 2018-07-01T17:18:06Z HIVE-20025: Clean-up of event files created by HiveProtoLoggingHook. ---- > Clean-up of event files created by HiveProtoLoggingHook. > -------------------------------------------------------- > > Key: HIVE-20025 > URL: https://issues.apache.org/jira/browse/HIVE-20025 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.0.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: Hive, hooks, pull-request-available > Fix For: 4.0.0 > > > Currently, HiveProtoLoggingHook write event data to hdfs. The number of files > can grow to very large numbers. > Since the files are created under a folder with Date being a part of the > path, hive should have a way to clean up data older than a certain configured > time / date. This can be a job that can run with as little frequency as just > once a day. > This time should be set to 1 week default. There should also be a sane upper > bound of # of files so that when a large cluster generates a lot of files > during a spike, we don't force the cluster fall over. -- This message was sent by Atlassian JIRA (v7.6.3#76005)