[ 
https://issues.apache.org/jira/browse/SENTRY-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154816#comment-16154816
 ] 

Alexander Kolbasov commented on SENTRY-1915:
--------------------------------------------

The idea of the fix is to compress all the layers and move the code directly in 
the SentryStore where we read objects. As we read them, we convert each path to 
the list of components and add them directly to the HMSPath() object.

> Sentry is doing a lot of work to convert list of paths to HMSPaths structure
> ----------------------------------------------------------------------------
>
>                 Key: SENTRY-1915
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1915
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.0.0
>            Reporter: Alexander Kolbasov
>            Assignee: Alexander Kolbasov
>         Attachments: SENTRY-1915.01.patch
>
>
> It turns out that in 2.0 we changed the way full snapshots are sent from 
> Sentry to HDFS. Before they were using {{HMSPaths}} which used tree structure 
> and eliminated some duplication. Also SENTRY-1827 helped to compressed this 
> on the serialization side.
> Now we are using {{TPathChanges}} structure that is not tree-based and 
> contains very non-efficient way of representing paths: {{required 
> list<list<string>> addPaths;}} so we split each paths on slashes and store 
> list of elements instead of storing a tree. As a result we may use much more 
> memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to