RE: Tables left over from regression test runs

Prashanth Vasudev Wed, 02 May 2018 10:30:57 -0700

This asynchronous drop can be later pushed down into TM for HA purposes.  TM 
records this drop today and executes it as part of commit. Making this drop 
asynchronous in TM would mean to record this info and let commit go through, 
followed by dropping table asynchronously.

Regards,
Prashanth

-----Original Message-----
From: Anoop Sharma <[email protected]> 
Sent: Wednesday, May 2, 2018 10:24 AM
To: [email protected]
Subject: RE: Tables left over from regression test runs

for speeding up regressions if all tables are being dropped, we can enable 
asynchronous drops.
Here we drop tables from metadata, and then issue asynchronous hbase drops that 
runs in parallel/background. The drop table command succeeds as soon as 
metadata drop succeeds.
If the same table is being recreated and an async drop is in progress, then it 
waits for drop to finish.

For schema drops, hbase tables that are part of the schema could be dropped 
asynchronously after dropping them from metadata.

This mode is not fully tested and externalized as there were some regressions 
issues when this change was initially added. Cant say what state it is in now. 
Will need to test it out.

Also this was done before ddl transactions support was added. So there may be 
some changes at dtm layer for this support if drops are within a user 
transaction and not in autocommit mode.

Changes are in ExpHbaseInterface::drop (look for async param) and in 
executor/HBaseClient_JNI.cpp::drop.

anoop

-----Original Message-----
From: Hans Zeller <[email protected]>
Sent: Wednesday, May 2, 2018 8:38 AM
To: [email protected]
Subject: RE: Tables left over from regression test runs

Hi,

My two cents: +1 on Dave's suggestion to clean up the tests. I think some of 
these leftover tables happen when people comment out the cleanup code for 
debugging and then accidentally check that change into git.

About speeding up regressions: I really like Ming's idea ( 
https://issues.apache.org/jira/browse/TRAFODION-2953 ) of storing multiple 
Trafodion tables in a single HBase table and think that this could potentially 
speed things up by a lot.

One command that is particularly slow is "drop schema cascade". Is there a way 
to speed this up, maybe by using a flavor of the "cleanup" command instead?

Thanks,

Hans

-----Original Message-----
From: Qifan Chen <[email protected]>
Sent: Wednesday, May 2, 2018 7:32 AM
To: [email protected]
Subject: Re: Tables left over from regression test runs

There is actually a case against HBase to allow a quick way to disable a table, 
before dropping it: https://issues.apache.org/jira/browse/HBASE-3557.  The case 
has been open since 2011.

Given the fact that it is slow to create/drop a table, it may be a good idea to 
promote table reuse in general.  We have been doing it for most of the HIVE 
tests, utilizing HIVE tables created during local hadoop setup time.

To allow HBase table reuse, we may need to name these tables more precisely, 
such as sb_056_t1 for a table used in SEABASE/TEST056.

On memory used in the region servers,  my understanding is once the table is 
closed, then the memory taken by that table is subjected to GC.

Thanks --Qifan

________________________________
From: Sean Broeder <[email protected]>
Sent: Wednesday, May 2, 2018 8:16:07 AM
To: [email protected]
Subject: RE: Tables left over from regression test runs

It seems like to accomplish what Dave is seeking the tables should be disabled 
at least.  Then if you really want to go back and look at the contents you 
could by re-enabling the tables, but the extra memory would be freed up in the 
region server.

If the tables have a common name for a given test, then you might be able to 
leverage a pattern match with a disable_all command and disable them all in a 
single statement at the end of the test.

Regards,
Sean

-----Original Message-----
From: Sandhya Sundaresan <[email protected]>
Sent: Tuesday, May 1, 2018 9:14 PM
To: [email protected]
Subject: Re: Tables left over from regression test runs

Agree that each test needs to be a "good citizen"  and cleanup all tables.  
SOme tests have the "-noCleanup" option that skip the final cleanup step. 
That's a really useful step to have in every test if possible. But in some 
cases tables are created and dropped mid test too. For those there is no choice 
but to modify the test if any kind of debugging needs to be done that need to 
tables to stay around.

Thanks for looking into these,  Dave.

Sandhya

________________________________
From: Dave Birdsall <[email protected]>
Sent: Tuesday, May 1, 2018 5:03:00 PM
To: [email protected]
Subject: RE: Tables left over from regression test runs

Hi,

Regarding why stopping hbase takes a long time: I was watching the HBase log 
today while doing a swstophbase. It was doing individual region closes on each 
table. It took a long time to get through all of them. Of course, one can 
always just kill the HMaster process (I sometimes do this) but that sometimes 
results in not being able to bring the instance up again, with loss of any 
working data. So that's risky.

Regarding time to drop tables: I'm noticing that many of the tests that don't 
drop tables at the end do so at the beginning. If they are run on a clean 
instance, that's fast (because it fails fast or it has "drop if exists"). If 
they are run on an instance where they have been run before, we pay the cost of 
dropping the table anyway. Agreed, for Jenkins it's better because we just 
throw the instance away after one run. For developers who are keeping test 
tables around, it's not so good.

Regarding the convenience of having objects around when there's a need to debug 
something: I've been unlucky at this. Almost always, the particular object I 
need is in a test that cleans up its objects. So I end up having to recreate it 
from a stripped down version of the test script. I suspect this is true more 
often than not. So I haven't found this particular argument persuasive.

Regarding speeding up HBase drop: Yes, that would be a great idea.

Dave

-----Original Message-----
From: Anoop Sharma <[email protected]>
Sent: Tuesday, May 1, 2018 4:51 PM
To: [email protected]
Subject: RE: Tables left over from regression test runs

yes, it is true that some tests do not drop all the tables that are created as 
part of that test.
This is not always intentional and at times it is because one missed cleaning 
them up.

But there are some advantages of not dropping tables at the end of a test run.

- drop hbase tables take a non-trivial amount of time.  dropping all tables 
will increase the time it takes to run a test.
  This will also impact Jenkins as it runs tests after init traf which cleans 
up everything
- is there a way to make dropping of table or dropping of whole schema faster? 
Using concurrent drops? Or drop without disable(disable is where most of the 
time is spent due to mem flush). There is an hbase jira on drop issue but no 
one has volunteered to fix it.
- some tables are permanent (like from QAT) that should not be cleaned up
- many tests drop tables at the beginning of the test or have an 'if not 
exists' clause.
- one advantage of not dropping a table at the end is that sometimes an issue 
could be diagnosed without having to recreate the table and associated 
dependent objects.
- if the only objects on a dev instance are regression tests, then doing 
ilh_trafinit will be much faster to clean up everything after full regressions.
  But this would also nuke any non-regression traf objects so one need to be 
careful about it
- should we also find out why stopping hbase takes a long time. Is there 
something that can be done to 'stop abrupt' on dev platform?

anoop

-----Original Message-----
From: Dave Birdsall <[email protected]>
Sent: Tuesday, May 1, 2018 3:57 PM
To: [email protected]
Subject: Tables left over from regression test runs

Hi,

I've noticed after running full regressions that there are a boatload of tables 
that don't get cleaned up.

These tables occupy regions in our instance's region server and I think may 
cause excessive memory usage and/or increasingly long times when stopping HBase.

So, I'm thinking about cleaning up some of our regression tests to drop these 
tables when they finish.

Does anyone object to this? Or is there some pressing need to keep any of these 
tables around after regressions complete?

Thanks,

Dave

RE: Tables left over from regression test runs

Reply via email to