Or you could just put it on ZFS and let it do all of the difficult stuff for you. :-)
On Thu, Aug 24, 2023 at 6:40 PM pslawek83 <[email protected]> wrote: > > Hi Marko, thanks for the response ... so for the DOCS part > > > https://mariadb.com/kb/en/innodb-page-compression/ > > When InnoDB page compression is used, InnoDB may still write the compressed > > page to the tablespace file with the original size of the uncompressed > > page, which would be equivalent to the value of the innodb_page_size system > > variable. This is done by design, because when InnoDB's I/O code needs to > > read the page from disk, it can only read the full page size. However, this > > is obviously not optimal. > > This punch hole technique allows InnoDB to read the compressed page from > > disk as the full page size, even though the compressed page really takes up > > less space on the file system. > > So as far as i understand if compression is enabled the server will write eg. > 40% utilized page which will later be decompressed when read, but this > compression/decompression won't have any advangage if there's no punch hole > support. The only difference will be that the block will be filled with zeros > for the empty space at the end. Or is it possible for PAGE_COMPRESSION to > still be used without punch hole (just not that effective for example)? > > Also tested this row compression, and yes page compression seems much better. > Is it possible to get rid of punch hole by just re-mapping the pages inside > these innodb files? I was thinking about this... if it's possible to change > the data structure to divide the file into 512b blocks while the beginning of > each block will mark page number (0xffffffff for blocks which are part of > previous page). To increase page size we'd need to add another page with same > number at the end of the file, later continuous recycled blocks can be added > to a list and used to store smaller pages. To make this atomic, first whole > block would be allocated and written with 0xfffffffe as a dummy id and then > this placeholder id could be replaced with real block number). > > The problem with this could be that the file would get 100% fragmented over > time (but probably will work ok with NVME). The mapping could be read back by > reading whole datafile during server start and it'd need 125MB of memory for > 1TB of compressed data to store the mapping in memory for fast access. > > What you thinik? Would it be easy to add it on top of current code (if > there's any API maybe, so for example i can do this) and would it even make > sense? > > Best, > Slawomir. > > Dnia 21 sierpnia 2023 07:08 Marko Mäkelä <[email protected]> > napisał(a): > > On Sun, Aug 20, 2023 at 6:12 PM pslawek83 via discuss > <[email protected]> wrote: > > Thanks everyone for the answers. Tested this PAGE_COMPRESSION a bit and make > some reading. Not sure if I fully understand how that works... is that > correct? > 1. Impossible without punch hole as if we don't have it the server will just > write partially used pages which will take full page size anyway. But in docs > it seems it described its use with punch hole disabled. What's the reason to > still have page compression without punch hole? > > Can you please point to the misleading part of the documentation so > that it can be corrected? > > 2. After enabling compression each page needs to be trimmed using punch hole, > so if we have 1 TB database and 64k blocks, the database files will have over > 15 million fragments (1000000000000/64000) > > That is correct. > > So if we get like 50% compression and then we'll be able to put another 1TB > of data into these holes we'll be having close to 31 million file fragments. > I'm not sure if under such conditions the FS will be able to even allocate > space efficiently anymore, copy these files, etc. Or im not getting something? > > For what it is worth, around the same time when the page_compressed > was being developed for MariaDB Server 10.1, something similar was > being worked on at Oracle, I think for MySQL 5.7. To my understanding, > the same hardware partner worked with both companies. The obvious > answer to > https://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html > would have been https://en.wikipedia.org/wiki/Fusion-io_NVMFS if it > had worked without kernel panics or existed in a mainline kernel. I > don’t even know if it ever was open sourced. > > 3. I was testing compression of ~1.5GB table with nice results on default > 16kb page size. One thing i noticed is that l was always getting exactly the > same compressed size using `du`. However the table read/copy times were > different from 40s for uncompressed to 80-400 seconds depending on algo. Is > that possible that is because page size is too low? > > As far as I remember, most compression algorithms that are derivatives > of the LZ77 family have an input buffer size of 32 KiB. From that > point of view, it would seem to make sense to use innodb_page_size=32k > or innodb_page_size=64k instead of the default. Furthermore, I think > that it was a mistake to implement bzip2 support at all. That > algorithm would work with much larger buffer sizes. > > 4. Myrocks seems to be very slow compared to compressed inno. Was also > reading docs related to it. For its usecase is it more suitable for storing > large rows rather than many small rows? > > To my understanding, the LSM trees that MyRocks is built on work best > in an insert-only workload. On top of LSM trees, MyRocks implements > some caches and Bloom filters. If you have few updates or deletes and > most reads are accessing recently inserted data, it could work. > > 5. I remember reading that some innodb compression is going to be obsolete. > Is that row compression? Or there are no plans to obsolete any compression > model? > > I wanted to deprecate and remove the ROW_FORMAT=COMPRESSED that I > designed and implemented in the InnoDB Plugin for MySQL 5.1 based on > some high level ideas of Heikki Tuuri between 2005 and 2007. > https://jira.mariadb.org/browse/MDEV-22367 was the first step towards > that. Based on community feedback, this won’t be happening. Even after > some data structure cleanup that took place in MariaDB Server 10.6, > supporting the format incurs an overhead of some 32 to 40 bytes per > buffer page descriptor, which is not insignificant. Removing both > ROW_FORMAT=COMPRESSED and the adaptive hash index would shrink the > block descriptor to 80 bytes from its current 10.6 size of 160 bytes. > The write performance with ROW_FORMAT=COMPRESSED should have been > degraded when I implemented the easier-to-parse log record format in > MDEV-12353 (MariaDB Server 10.5). There are open bugs about > ROW_FORMAT=COMPRESSED, such as > https://jira.mariadb.org/browse/MDEV-31574. I would say that the > InnoDB ROW_FORMAT=COMPRESSED seemed like a good idea back when it was > implemented, when HDD storage was the only practical option. Just like > the InnoDB change buffer, which I removed in MariaDB Server 11.0 after > having to deal with too many cases of corruption: > https://fosdem.org/2023/schedule/event/innodb_change_buffer/ > With best regards, > Marko > -- > Marko Mäkelä, Lead Developer InnoDB > MariaDB plc > > _______________________________________________ discuss mailing list -- [email protected] To unsubscribe send an email to [email protected]
