Re: How to enable compaction for table with external data?

Alan Gates Fri, 25 Sep 2015 09:28:08 -0700

Sorry for the slow response, I missed the email in my inbox.

When you write the data directly using a storm topology, how are youcommunicating to Hive that the new data exists? When streaming data invia Hive Streaming the txn commits tells the system that new data isarriving in that table or partition and thus it should watch for a needto compact that table or partition. Are you doing txn commits via themetastore thrift interface?

Regardless of this, when you've written data in and you manually requesta compaction, if there are delta files, the compaction should occur.Can you share the arguments you are passing to the compact call and theoutput of the SHOW COMPACTIONS you issued afterwards.


Alan.

Sachin Pasalkar <mailto:[email protected]>
September 15, 2015 at 22:35
Yes below are the values set in hive. Initally I haven’t mentionedNO_AUTO_COMPACTION in my table definition, which didn’t work so I haveput it with value as false.
hive.compactor.initiator.on
hive.compactor.worker.threads
hive.compactor.worker.timeout
hive.compactor.check.interval
hive.compactor.delta.num.threshold
hive.compactor.delta.pct.threshold

Thanks,
Sachin

From: Alan Gates <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected] <mailto:[email protected]>"<[email protected] <mailto:[email protected]>>
Date: Tuesday, 15 September 2015 10:30 pm
To: "[email protected] <mailto:[email protected]>"<[email protected] <mailto:[email protected]>>
Subject: Re: How to enable compaction for table with external data?
If you want it to compact automatically you should not putNO_AUTO_COMPACTION in the table properties.
First question, did you turn on the compactor on your metastore thriftserver? To do this you need to set a couple of values in themetastore's hive-site.xml:
hive.compactor.initiator.on=true
hive.compactor.worker.threads=1 # or more

Alan.

Alan Gates <mailto:[email protected]>
September 15, 2015 at 10:00
If you want it to compact automatically you should not putNO_AUTO_COMPACTION in the table properties.
First question, did you turn on the compactor on your metastore thriftserver? To do this you need to set a couple of values in themetastore's hive-site.xml:
hive.compactor.initiator.on=true
hive.compactor.worker.threads=1 # or more

Alan.

Sachin Pasalkar <mailto:[email protected]>
September 14, 2015 at 3:03
Hi,
We are writing direct orc file from storm topology instead of usinghive streaming (Due to performance issue with our data). However, wewant to compact the data. So we have added the"NO_AUTO_COMPACTION"=“false” option in table which we created to readdata(1.6 GB scattered in multiple small files) in ORC file. Does“NO_AUTO_COMPACTION” means it will not compact data while hivestreaming is used? If no, why it did not compact our data into 1 file?
We also tried manually calling compaction from java code usingorg.apache.hadoop.hive.metastore.txn.TxnHandler’s compact API whichshows it has started compaction, when we execute command Showcompactions. But still does not work. I don’t want to execute themanual commands from command line.
Is there any way?

PS: We are writing all files in one directory only.

Thanks,
Sachin

Re: How to enable compaction for table with external data?

Reply via email to