Mike Percy created KUDU-2693:
--------------------------------
Summary: Buffer DiskRowSet flushes to more efficiently write many
columns
Key: KUDU-2693
URL: https://issues.apache.org/jira/browse/KUDU-2693
Project: Kudu
Issue Type: Improvement
Components: fs, tablet
Affects Versions: 1.9.0
Reporter: Mike Percy
When looking at a trace of some MRS flushes on a table with 280 columns, it was
observed that during the course of the flush some 695 fdatasync() calls
occurred.
One possible way to minimize the number of fsync calls would be to flush
directly to memory buffers first, determine the ideal layout on disk for the
flushed blocks (possibly striped across one log block container per data disk)
and then potentially write the data out to the containers in parallel. This
would require some memory buffer space to be reserved per maintenance manager
thread, possibly 64MB since the DRS roll size is 32MB.
According to Todd we could probably do it all in LogBlockManager by adding a
new flag to CreateBlockOptions that says whether to buffer or something like
that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)