[
https://issues.apache.org/jira/browse/SENTRY-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151945#comment-16151945
]
Alexander Kolbasov commented on SENTRY-1915:
--------------------------------------------
It turns out that in the end we do use PathDump structures to send to HDFS, it
is just the fact that we are not very efficient in handling these - we create a
lot of intermediate structures before we get to it.
> Sentry should use old PathDump structures to send full snapshots to HDFS
> ------------------------------------------------------------------------
>
> Key: SENTRY-1915
> URL: https://issues.apache.org/jira/browse/SENTRY-1915
> Project: Sentry
> Issue Type: Bug
> Components: Sentry
> Affects Versions: 2.0.0
> Reporter: Alexander Kolbasov
>
> It turns out that in 2.0 we changed the way full snapshots are sent from
> Sentry to HDFS. Before they were using {{HMSPaths}} which used tree structure
> and eliminated some duplication. Also SENTRY-1827 helped to compressed this
> on the serialization side.
> Now we are using {{TPathChanges}} structure that is not tree-based and
> contains very non-efficient way of representing paths: {{required
> list<list<string>> addPaths;}} so we split each paths on slashes and store
> list of elements instead of storing a tree. As a result we may use much more
> memory.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)