RE: [DISCUSS] Improve the load time for HMS startup for HDFS paths sync

Ma, Junjie Mon, 04 Jan 2016 18:23:47 -0800

Is it possible to have a local file to cache the path/privileges?
When start to sync the data, load the data from local file first, then ask the 
Sentry service for the missing part.
When poll the data for path/privileges update, write to the local file if the 
data is exceed the threshold, eg, write the data to file for every 1000 update.
For the HA part, the local file should be sync.
To customer, there is nothing change with this solution, and the performance 
should be improved.
To developer, the implementation won't change the current design.
Just a rough idea, feel free to discuss.


Best regards,

Colin Ma(Ma Jun Jie)

-----Original Message-----
From: Hao Hao [mailto:[email protected]] 
Sent: Tuesday, January 5, 2016 6:46 AM
To: [email protected]
Subject: Re: [DISCUSS] Improve the load time for HMS startup for HDFS paths sync

Any opinions? Thanks!

Best,
Hao

On Thu, Dec 17, 2015 at 11:54 PM, Hao Hao <[email protected]> wrote:

> Hi all,Now for large metastores, hdfs path sync can take up to 10m to 
> start up. We need to improve the current load time for starting Hive 
> Metastore, which documented in the Jira 
> <https://issues.apache.org/jira/browse/SENTRY-990>. Propose solutions
> here:
>
> Solution 1: During initialization, we can chunk all updates to small 
> pieces and do not block the start up by waiting for sending the updates.
> The plugin can send the updates to sentry service based on the delta 
> after HMS start.
>
>
> Problems:
>
>    - How to decide when to chunk? We can have configurable timer or paths
>    update number limits to decide the chunk of updates.
>    - How to track the delta and the order of the requests? Make use of
>    the current update sequence number mechanism.
>    -
>
>    How to work with HA? (Need some inputs here)
>
>
>    -
>
>    How do the customer work with the new design? (Especially during
>    startup)
>
>
>    - Client side connections need to be thread safe.
>
>
> Solution 2: Have lazy updating mechanism: update the path based on the 
> namenode request. Do not prefer this approach, since it can impact the 
> performance on HDFS plugin.
> Any opinions about the proposal? Thanks a lot!
>
> Best,
> Hao
>

RE: [DISCUSS] Improve the load time for HMS startup for HDFS paths sync

Reply via email to