[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

Eugene Koifman (JIRA) Thu, 19 May 2016 16:18:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292324#comment-15292324
 ]


Eugene Koifman commented on HIVE-13354:
---------------------------------------

{quote} // intentionally set this high so that ttp1 will not trigger major 
compaction later on
       conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 
0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location

{quote}
           runWorker(conf);  // compact ttp2
            runWorker(conf);  // compact ttp1
            runCleaner(conf);
            rsp = txnHandler.showCompact(new ShowCompactRequest());
            Assert.assertEquals(2, rsp.getCompacts().size());
            Assert.assertEquals("ttp2", 
rsp.getCompacts().get(0).getTablename());
            Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(0).getState());
            Assert.assertEquals("ttp1", 
rsp.getCompacts().get(1).getTablename());
            Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...  
Also, perhaps TxnStrore.CLEANING_RESPONSE would be better

{quote}
           // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas 
ttp2 has 0.5 (from tblproperties)
            // so only ttp2 will trigger major compaction for the newly 
inserted row (actual pct: 0.66)
{quote}
this seems wrong.    ttp2 had 5 rows which were Major compacted into a base.  
Now 2 more rows are added.  2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of 
the file size.

bq. 949     Assert.assertEquals("ready for cleaning", 
rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after 
runCleaner().  Why isn't it?

bq. 973     
Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?

{quote}
void compact(String dbname, String tableName, String partitionName, 
CompactionType type,
1440                   Map<String, String> tblproperties) throws TException;
1440    
{quote}
This is public API change so should probably deprecate the method with old 
signature

{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, 
CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, 
CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, 
CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)");
{quote}
A new column is added here but the number of "?" is the same.  How does this 
work?

{quote}
714             rs = stmt.executeQuery("select cc_id, cc_database, cc_table, 
cc_partition, cc_state, " +
715                 "cc_tblproperties from COMPLETED_COMPACTIONS order by 
cc_database, cc_table, " +
716                 "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from 
history?

etc


> Add ability to specify Compaction options per table and per request
> -------------------------------------------------------------------
>
>                 Key: HIVE-13354
>                 URL: https://issues.apache.org/jira/browse/HIVE-13354
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Eugene Koifman
>            Assignee: Wei Zheng
>              Labels: TODOC2.1
>         Attachments: HIVE-13354.1.patch, 
> HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13354) Add ability to specify Compaction options per table and per request

Reply via email to