GeorgeJahad opened a new pull request #2703:
URL: https://github.com/apache/ozone/pull/2703


   ## What changes were proposed in this pull request?
   
   When the OM-HA code was added, the tests were not completely updated.  Many 
of them still call the OzoneManagerProtocol "write-path" methods even though 
those methods have been superseded by the HA methods.  For this reason the 
obsolete protocol methods in the OzoneManager class could not be removed.
   
   This PR replaces all those obsolete methods with default methods in 
OzoneManagerProtocol.java that just throw an UnsupportedOperationException.  
   
   It also fixes the tests broken by those changes.  
   
   ## Unremoved methods
   The following methods are in the OzoneManager class, on the write path.  
They were not removed because they are invoked from with the OmRequest 
subclasses:
   ```
   OzoneManager::finalizeUpgrade
   OzoneManager::getDelegationToken
   OzoneManager::renewDelegationToken
   OzoneManager::cancelDelegationToken
   ```
   
   ## Other dead code removed
   In my testing, I noticed a couple of other methods that are no longer used 
and removed them as well:
   
   OzoneManager::checkVolumeAccess is unsupported in the 
[VolumeManagerImpl](https://github.com/apache/ozone/blob/fcd016021452d14371e3335982ee0f7cb72fac5c/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManagerImpl.java#L386)
   
   
[OzoneManagerRequestHandler::allocateBlock](https://github.com/apache/ozone/blob/e5c647eb65b896d629f8e20e066cccae32834d80/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OzoneManagerRequestHandler.java#L458)
 is a private method that is not called from anywhere
   
   
   ## Complex Fixes
   
   Most of the broken tests were easy to fix.  Basically any the tests that 
invoked the removed methods directly from the OzoneManager.class were modified 
to invoke them from an OzoneManager client.  There were a couple of difficult 
areas:
   
   ### TestOmMetrics class
   
   These tests verify that the correct number of metrics are being generated.  
Most of them use mocks invoked by the methods removed in this PR.  So they had 
to be significantly restructured.  In most cases the mocks had to be replaced 
with a miniCluster, (because there is no other way to generate the metrics.)
   
   My goal was to fix the tests so the number of metrics would not change.  I 
was able to do that in all cases except one. One of the tests uses the 
checkVolumeAccess() method, which was removed months ago, (as mentioned above). 
 The old mocks were incorrectly ignoring the exception it has been generating. 
So I removed it from the test and corrected the corresponding counts.  The rest 
of the counts are unchanged.
   
   ### listStatus() race condition while key is in cache
   Under certain circumstances, listStatus() fails to return a key that is in 
the cache, (but not in the keyTable).  This race did not occur prior to this PR 
because the test was using the pre-HA version which doesn't use the cache. 
   
   Since it works when the key is in the table, I fixed the test by retrying 
until it is read from the table, but it is possible that listStatus() itself 
should be fixed.  The details of the race are a bit complicated, but explained 
below.
   
   #### listStatus() race condition details
   
   With the commitKey OmRequest, the key is first written to the keyTableCache 
and then to the keyTable when the double buffer is flushed.
   
   If the OZONE_OM_ENABLE_FILESYSTEM_PATHS is enabled, and the key looks like a 
file name, the key's superdirectory's are added to cache during the openKey 
[OmRequest](https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCreateRequest.java#L270)
   
   If OZONE_OM_ENABLE_FILESYSTEM_PATHS is not enabled, the superdirectories are 
not added in the openkey request.  The cache search method in listStatus() is 
written to skip any keys with an embedded "/" 
[here](https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2285)
   
   So, if OZONE_OM_ENABLE_FILESYSTEM_PATHS is disabled, and a key with an 
embedded "/" is commited, and it is still in the cache, listStatus() won't see 
it.
   
   listStatus() does return the key when it gets flushed to the table, so 
sleeping for a bit fixes the problem.
   
   I've only noticed this happening on one test and only 20% of the time.  
(I've put a sleep in to handle that case.)  But we should discuss if that is 
good enough.
   
   ## Genesis Broken in Master
   `ozone genesis -b BenchMarkOzoneManager` uses the pre-HA code and so should 
also be updated for these changes.  
   
   It is currently broken in the master branch.  It generates this exception 
when I try to run it in master:
   ```
   org.apache.hadoop.metrics2.MetricsException: Metrics source 
DBCheckpointMetrics already exists!
   ```
   
   I think that error will need to be fixed before I can update it, so I've 
left it out of this PR.  Let me know if that's a problem.
   
   Since the CI tests don't run Genesis, the lack of updates don't cause the 
tests to fail.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3371
   
   ## How was this patch tested?
   
   All effected tests were updated and all pass without fail.
   
   I timed all the changed tests. Before the changes, they took 12 minutes on 
average.  Now they take 12.5 minutes on average.
   
   I ran the CI 12 times each on master and on this branch.  master failed 6 of 
those times, this branch failed 6 of those times.  While there was some overlap 
in the failures, some were unique to master and some to this branch.
   
   As best I could tell, none of the failures were related to these changes, 
but it is hard to be sure.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to