[
https://issues.apache.org/jira/browse/KUDU-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Berkeley updated KUDU-2693:
--------------------------------
Code Review: https://gerrit.cloudera.org/#/c/12425/
> Buffer DiskRowSet flushes to more efficiently write many columns
> ----------------------------------------------------------------
>
> Key: KUDU-2693
> URL: https://issues.apache.org/jira/browse/KUDU-2693
> Project: Kudu
> Issue Type: Improvement
> Components: fs, tablet
> Affects Versions: 1.9.0
> Reporter: Mike Percy
> Assignee: Todd Lipcon
> Priority: Major
>
> When looking at a trace of some MRS flushes on a table with 280 columns, it
> was observed that during the course of the flush some 695 fdatasync() calls
> occurred.
> One possible way to minimize the number of fsync calls would be to flush
> directly to memory buffers first, determine the ideal layout on disk for the
> flushed blocks (possibly striped across one log block container per data
> disk) and then potentially write the data out to the containers in parallel.
> This would require some memory buffer space to be reserved per maintenance
> manager thread, possibly 64MB since the DRS roll size is 32MB.
> According to Todd we could probably do it all in LogBlockManager by adding a
> new flag to CreateBlockOptions that says whether to buffer or something like
> that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)