[
https://issues.apache.org/jira/browse/HBASE-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762742#comment-13762742
]
Vasu Mariyala commented on HBASE-9490:
--------------------------------------
The following are the solutions
a) Change the category of the tests which change the static variables to medium
category ([~lhofhansl] suggestion)
b) Change the tests to run in different jvm's. (3:05.855s vs 5:09.840s on my
local machine). Currently with -PlocalTests, it always runs the small tests in
a different jvm. So the time increase would mostly be in the build machine.
> Provide independent execution environment for small tests
> ---------------------------------------------------------
>
> Key: HBASE-9490
> URL: https://issues.apache.org/jira/browse/HBASE-9490
> Project: HBase
> Issue Type: Improvement
> Reporter: Vasu Mariyala
> Assignee: Vasu Mariyala
>
> Some of the state related to schema metrics is stored in static variables and
> since the small test cases are run in a single jvm, it is causing random
> behavior in the output of the tests.
> An example scenario is the test case failures in HBASE-8930
> {code}
> for (SchemaMetrics cfm : tableAndFamilyToMetrics.values()) {
> if (metricName.startsWith(CF_PREFIX + CF_PREFIX)) {
> throw new AssertionError("Column family prefix used twice: " +
> metricName);
> }
> {code}
> The above code throws an error when the metric name starts with "cf.cf.". It
> would be helpful if any one sheds some light on the reason behind checking
> for "cf.cf."
> The scenarios in which we would have a metric name start with "cf.cf." are as
> follows (See generateSchemaMetricsPrefix method of SchemaMetrics)
> a) The column family name should be "cf"
> AND
> b) The table name is either "" or use table name globally should be false
> (useTableNameGlobally variable of SchemaMetrics).
> Table name is empty only in the case of ALL_SCHEMA_METRICS which has the
> column family as "". So we could rule out the
> possibility of the table name being empty.
> Also to note, the variables "useTableNameGlobally" and
> "tableAndFamilyToMetrics" of SchemaMetrics are static and are shared across
> all the tests that run in a single jvm. In our case, the profile runAllTests
> has the below configuration
> {code}
> <surefire.firstPartForkMode>once</surefire.firstPartForkMode>
> <surefire.firstPartParallel>none</surefire.firstPartParallel>
> <surefire.firstPartThreadCount>1</surefire.firstPartThreadCount>
>
> <surefire.firstPartGroups>org.apache.hadoop.hbase.SmallTests</surefire.firstPartGroups>
> {code}
> Hence all of our small tests run in a single jvm and share the above
> variables "useTableNameGlobally" and "tableAndFamilyToMetrics".
> The reasons why the order of execution of the tests caused this failure are
> as follows
> a) A bunch of small tests like TestMemStore, TestSchemaConfiguredset set the
> useTableNameGlobally to false. But these tests don't create tables that have
> the column family name as "cf".
> b) If the tests in step (a) run before the tests which create table/regions
> with column family 'cf', metric names would start with "cf.cf."
> c) If any of other tests, like the failed tests(TestScannerSelectionUsingTTL,
> TestHFileReaderV1, TestScannerSelectionUsingKeyRange), validate schema
> metrics, they would fail as the metric names start with "cf.cf."
> On my local machine, I have tried to re-create the failure scenario by
> changing the sure fire test configuration and creating a simple (TestSimple)
> which just creates a region for the table 'testtable' and column family 'cf'.
> {code}
> TestSimple.java
> ------------------------------------------------------------------
> @Before
> public void setUp() throws Exception {
> HTableDescriptor htd = new HTableDescriptor(TABLE_NAME_BYTES);
> htd.addFamily(new HColumnDescriptor(FAMILY_NAME_BYTES));
> HRegionInfo info = new HRegionInfo(TABLE_NAME_BYTES, null, null, false);
> this.region = HRegion.createHRegion(info, TEST_UTIL.getDataTestDir(),
> TEST_UTIL.getConfiguration(), htd);
> Put put = new Put(ROW_BYTES);
> for (int i = 0; i < 10; i += 2) {
> // puts 0, 2, 4, 6 and 8
> put.add(FAMILY_NAME_BYTES, Bytes.toBytes(QUALIFIER_PREFIX + i), i,
> Bytes.toBytes(VALUE_PREFIX + i));
> }
> this.region.put(put);
> this.region.flushcache();
> }
> @Test
> public void testFilterInvocation() throws Exception {
> System.out.println("testing");
> }
> @After
> public void tearDown() throws Exception {
> HLog hlog = region.getLog();
> region.close();
> hlog.closeAndDelete();
> }
> Successful run:
> -------------------------------------------------------
> T E S T S
> -------------------------------------------------------
> 2013-09-09 15:38:03.478 java[46562:db03] Unable to load realm mapping info
> from SCDynamicStore
> Running org.apache.hadoop.hbase.filter.TestSimple
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 sec
> Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.085 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.217 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.618 sec
> Running org.apache.hadoop.hbase.regionserver.TestMemStore
> Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.542 sec
> Results :
> Tests run: 35, Failures: 0, Errors: 0, Skipped: 0
> ------------------------------------------------------------------
> Failed run order:
> -------------------------------------------------------
> T E S T S
> -------------------------------------------------------
> 2013-09-09 15:43:21.466 java[46890:db03] Unable to load realm mapping info
> from SCDynamicStore
> Running org.apache.hadoop.hbase.regionserver.TestMemStore
> Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.967 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.659 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.15 sec
> Running org.apache.hadoop.hbase.filter.TestSimple
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.031 sec
> Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 24.883 sec
> <<< FAILURE!
> Results :
> Failed tests:
> testReadingExistingVersion1HFile(org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1):
> Column family prefix used twice: cf.cf.bt.Data.fsReadnumops
> Tests run: 35, Failures: 1, Errors: 0, Skipped: 0
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> {code}
> In the failed scenario, the below has happened
> a) TestMemStore sets the useTableNameGlobally to false
> b) TestScannerSelectionUsingKeyRange, TestScannerSelectionUsingKeyRange are
> successful as they don't create table with column family name "cf"
> c) TestSimple creates a region for table 'testtable' and column family 'cf'.
> Since useTableNameGlobally is set to false, it would create metric names that
> start with "cf.cf."
> d) TestHFileReaderV1 while validating metrics would fail as the metric names
> start with "cf.cf."
> The reason why this has been exposed due to this patch is because TestSimple
> is TestInvocationRecordFilter. The executions of the build 1136, 1137 and
> 1138 which have been executed after this patch have a different order of
> executions when compared to the failed builds 1139 & 1140.
> One simple fix to address the issue would have been to change the column
> family name from "cf" to "mycf" in the TestInvocationRecordFilter. But to
> avoid future occurrences of these issues, I would suggest setting the
> "surefire.firstPartForkMode" to "always" similar to the settings we use while
> running localTests, medium & large tests.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira