> On 21 Oct 2016, at 17:34, Kristian Nielsen <kniel...@knielsen-hq.org> wrote:
> 
> Simon Mudd <simon.m...@booking.com> writes:
> 
>>> This would result in higher overhead on each event. There is a fixed header
> 
>> Ok. I’ve been assuming the headers were small (from some casual browsing of 
>> things
>> related to the binlog router some time ago), but that may be wrong.
> 
> Yes, they are quite small, 10-20 bytes per event or something like that.
> 
>> Indeed, one of the things about the current binlog format is that there’s 
>> little
>> complete documentation outside of the code. Code changes and there’s no
>> clear specification. It makes things much better if what’s currently 
>> implicit is
>> explicit and also if the specs are outside of the code.  That’s something I
> 
> Tell me about it ... it is _very_ hard to change most anything in
> replication without breaking some odd corner somewhere.
> 
>> Fixing the case for RBR is good but I feel the focus may be too narrow,
>> especially if the approach can be used more generically.
>> 
>> I certainly have some SBR machines which generate large volumes of bin logs
>> and to be able to compress the events they generate on disk would be most 
>> helpful.
> 
> Right. This patch compresses query events (ie. statement-based updates) and
> row-events, so both of these are covered. LOAD DATA INFILE in statement mode
> is not (but in row-based mode it should be, I think).

I use both LOAD DATA INFILE which is great on the box you load the data into 
but awful on a downstream
slave that "streams down” the data, only to output it to a temporary file which 
is loaded back in again…
[ The design is logical but I’d love to see the LOAD DATA INFILE turned 
directly into a RBR binlog stream,
certainly not be default, but as an option, as that should reduce load on the 
downstream slaves which would
not have to reprocess the input stream as they do now. ]

I also have “big inserts”  of say 16 MB SBR events often of the type: INSERT 
INTO … VALUES … [ON DUPLICATE KEY UPDATE …].
For usage such as this the “text” is big, and so the individual event does 
compress pretty well, so the win would be big.

So compression is good and there are several use cases, thus making it as 
generic as possible would benefit
more people.  That may not be appropriate for this suggested patch, and it’s 
good to see people offering
solutions to their own issues but at least it could perhaps be considered as 
future functionality.

A side effect of a more generic mechanism would hopefully be that this _same_ 
mechanism could be implemented upstream
and would work even if the internal events that go through the "compression 
pipeline” are different.  That avoids
feature drift or dual incompatible implementations which would not be very good 
and has happened already (GTID).

Anyway perhaps I’ve drifted off-topic for your comments on the patch but this 
certainly “woke me up” … :-)

Simon
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to