Hi all,Now for large metastores, hdfs path sync can take up to 10m to start up. We need to improve the current load time for starting Hive Metastore, which documented in the Jira <https://issues.apache.org/jira/browse/SENTRY-990>. Propose solutions here:
Solution 1: During initialization, we can chunk all updates to small pieces and do not block the start up by waiting for sending the updates. The plugin can send the updates to sentry service based on the delta after HMS start. Problems: - How to decide when to chunk? We can have configurable timer or paths update number limits to decide the chunk of updates. - How to track the delta and the order of the requests? Make use of the current update sequence number mechanism. - How to work with HA? (Need some inputs here) - How do the customer work with the new design? (Especially during startup) - Client side connections need to be thread safe. Solution 2: Have lazy updating mechanism: update the path based on the namenode request. Do not prefer this approach, since it can impact the performance on HDFS plugin. Any opinions about the proposal? Thanks a lot! Best, Hao
