[
https://issues.apache.org/jira/browse/KUDU-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328376#comment-16328376
]
Todd Lipcon commented on KUDU-2263:
-----------------------------------
As a point of reference, the consensus-meta for a single master is 10114 bytes,
of which less than 100 bytes are the actual protobuf.
> Consider removing PB descriptors from PBC header
> ------------------------------------------------
>
> Key: KUDU-2263
> URL: https://issues.apache.org/jira/browse/KUDU-2263
> Project: Kudu
> Issue Type: Improvement
> Components: util
> Affects Versions: 1.7.0
> Reporter: Todd Lipcon
> Priority: Major
>
> Looking at a cmeta file on disk, it seems the vast majority of the bytes are
> in the supplemental header. We currently serialize the entire descriptor set
> of the referenced file and its dependencies. This means that in each cmeta
> file, we end up serializing even things like the definition of SchemaPB –
> unnecessary to serialize the type at hand and quite large.
>
> At a minimum we can prune the descriptors serialized to only include those
> that are transitively referenced by the PB type in the file. I think we
> should also consider doing away with this information entirely and instead
> allow 'kudu pbc dump' to take a descriptor set as external input – it's easy
> enough to generate a descriptor set from any kudu version source tree using
> the protoc command line.
> One potential major improvement if we can get these files down to <4kb is
> that we could atomically rewrite them in a single disk IO using O_DIRECT
> rather than doing a rewrite-rename-fsync dance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)