Hi Devs, Recently I was doing ingestion experiments, and found out our default log buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively impacts the ingestion performance. The short conclusion is that by simply increasing the log buffer size (e.g., to 32MB), I can improve the ingestion performance by *50% ~ 100%* on a single node sensorium machine as shown follows.
The detailed explanation of log buffer size is as follows. Right now we have a background LogFlusher thread which continuously forces log records to disk. When the log buffer is full, writers are blocked to wait for log buffer space. However, when setting the log buffer size, we have to consider the LSM operations as well. The memory component is first filled up with incoming records at a very high speed, which is then flushed to disk at a relatively low speed. If the log buffer size is small, ingestion is very likely to be blocked by the LogFlusher when filling up the memory component. This blocking is wasted since quite often flush/merge is idle. However, when the log buffer is relatively large, the LogFlush can catch up itself when ingestion is blocked by flush/merge, which is not harmful since there is ongoing LSM I/O operations. I didn't know how large the log buffer size should be right now (as it depends on various factors), but our default value *1MB* is very likely too small to cause blocking during normal ingestion time. Just let you know and be aware of this parameter when you measure ingestion performance... Best regards, Chen Luo
