[
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292324#comment-15292324
]
Eugene Koifman commented on HIVE-13354:
---------------------------------------
{quote} // intentionally set this high so that ttp1 will not trigger major
compaction later on
conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD,
0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location
{quote}
runWorker(conf); // compact ttp2
runWorker(conf); // compact ttp1
runCleaner(conf);
rsp = txnHandler.showCompact(new ShowCompactRequest());
Assert.assertEquals(2, rsp.getCompacts().size());
Assert.assertEquals("ttp2",
rsp.getCompacts().get(0).getTablename());
Assert.assertEquals("ready for cleaning",
rsp.getCompacts().get(0).getState());
Assert.assertEquals("ttp1",
rsp.getCompacts().get(1).getTablename());
Assert.assertEquals("ready for cleaning",
rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...
Also, perhaps TxnStrore.CLEANING_RESPONSE would be better
{quote}
// ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas
ttp2 has 0.5 (from tblproperties)
// so only ttp2 will trigger major compaction for the newly
inserted row (actual pct: 0.66)
{quote}
this seems wrong. ttp2 had 5 rows which were Major compacted into a base.
Now 2 more rows are added. 2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of
the file size.
bq. 949 Assert.assertEquals("ready for cleaning",
rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after
runCleaner(). Why isn't it?
bq. 973
Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?
{quote}
void compact(String dbname, String tableName, String partitionName,
CompactionType type,
1440 Map<String, String> tblproperties) throws TException;
1440
{quote}
This is public API change so should probably deprecate the method with old
signature
{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID,
CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES,
CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO,
CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)");
{quote}
A new column is added here but the number of "?" is the same. How does this
work?
{quote}
714 rs = stmt.executeQuery("select cc_id, cc_database, cc_table,
cc_partition, cc_state, " +
715 "cc_tblproperties from COMPLETED_COMPACTIONS order by
cc_database, cc_table, " +
716 "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from
history?
etc
> Add ability to specify Compaction options per table and per request
> -------------------------------------------------------------------
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 1.3.0, 2.0.0
> Reporter: Eugene Koifman
> Assignee: Wei Zheng
> Labels: TODOC2.1
> Attachments: HIVE-13354.1.patch,
> HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is
> triggered. They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore. There is
> currently no way to control job parameters (like memory, for example) except
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if
> launched via ALTER TABLE)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)