> I think reserializing the payload into a new format is counter productive to > some of the performance goals of the binary logs? If you have to deserialize > and reserialize the message you are going to be throwing off a ton of extra > GC.
That isn’t what happens in FQL =D. FQL creates a custom payload using Chronicle fields, then serializes QueryOptions (we have in-memory objects we use for the query). We are not taking the client network bytes and saving to a log (client bytes could be different pages… this would be annoying to support), we are working with the following String query ByteBuffer[] binds QueryOptions options “Could” we use our networking serializer? Sure, but then what do we get? The cost to construct the object to pass to the serializer is basically the same, so it’s just the time it takes to serialize it, and I argue in this very specific case the costs are not really that noticeable (and have benchmarked...). So, we put a burden on users (and us to maintain binary compatibility with QueryOptions), making it harder for them at the cost of a few nanoseconds more to serialize? > On Sep 19, 2024, at 3:32 PM, Štefan Miklošovič <smikloso...@apache.org> wrote: > > Wow this is quite a rabbit hole. > > What is ultimately going to be written into Chronicle Queue is what > writeMarshallablePayload method on AbstractLogQuery puts into that WireOut. > If we take e.g. QUERY_OPTIONS into consideration, then it writes it into > queryOptionsBuffer which is populated in AbstractLogEntry's constructor > (QueryOptions.codec.encode). > > That takes QueryOptions, deserialized stuff we got from QueryMessage via > codec in its decode method, and it encodes it back to ByteBuf. So for now, we > just serialize what we deserialized all over again. > > But for what reason do we need to serialize it again upon logging it to FQL? > "body" here which is used for decoding bytes to QueryOptions is ByteBuf > already. So if we go to write bytes then well here we have it. It does not > seem to be necessary to decode / encode, just use these bytes as they are? > > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L51 > > On Thu, Sep 19, 2024 at 11:50 PM Benedict Elliott Smith <bened...@apache.org > <mailto:bened...@apache.org>> wrote: >> Well, that looks like item number one to fix when we change the >> serialisation format. We should clearly not duplicate query strings we have >> recently logged. >> >> We do however appear to also serialise the bind variables, which benefit >> from being in the format we already have available in memory. >> >>> On 19 Sep 2024, at 22:26, Štefan Miklošovič <smikloso...@apache.org >>> <mailto:smikloso...@apache.org>> wrote: >>> >>> I am not sure what you mean. I mean, I do, but not following. Look into >>> FullQueryLogger (1) what it goes to put into CQL is a query like String >>> wrapped in a Query object. It literally take a String as a representation >>> of a query a user executed. We just replace this by serializing that query >>> to protobuf. What is counter productive? We just replace one thing for >>> another. Audit message / events would be similar. >>> >>> (1) >>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321 >>> >>> On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan <jeremiah.jor...@gmail.com >>> <mailto:jeremiah.jor...@gmail.com>> wrote: >>>> I think reserializing the payload into a new format is counter productive >>>> to some of the performance goals of the binary logs? >>>> If you have to deserialize and reserialize the message you are going to be >>>> throwing off a ton of extra GC. >>>> I think we have done a lot of work in recent version to reduce the amount >>>> of re-serialization that happens in the query paths? Not sure we want to >>>> add some back in on purpose? Keeping the payload in the internal >>>> serialization format does indeed have the drawbacks David mentioned, but I >>>> think “zero serialization overhead” is a pretty big advantage to keeping >>>> things that way? >>>> >>>> -Jeremiah >>>> >>>>> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič <smikloso...@apache.org >>>>> <mailto:smikloso...@apache.org>> wrote: >>>>> >>>>> >>>>> I think protobuf upon serialization is just a bunch of bytes anyway. If >>>>> we figure out the header as David showed then we can still serialize it >>>>> all with the machinery / serializers you mentioned. It can write bytes, >>>>> right?! I very briefly checked and I think that protobuf is super simple >>>>> and does not have any checksumming etc. so some sauce on top of that >>>>> would be necessary anyway and we can reuse what we have to produce binary >>>>> files. >>>>> >>>>> On the consumer side, the binary file would be parsed with some tooling >>>>> e.g. in Go, indeed, but the headers and stuff would be so simple that it >>>>> would be just a coding exercise and then it might be deserialized with >>>>> protobuf for that language. >>>>> >>>>> Basically, only the payload itself would be the product of protobuf and >>>>> all around super simple to crack through. >>>>> >>>>> On Thu, Sep 19, 2024 at 10:41 PM Benedict <bened...@apache.org >>>>> <mailto:bened...@apache.org>> wrote: >>>>>> Sorry, I missed that. I’m not convinced any of these logs need language >>>>>> agnostic tools for access, but if that’s a goal for other folk I don’t >>>>>> feel strongly about it. >>>>>> >>>>>>> On 19 Sep 2024, at 21:06, Štefan Miklošovič <smikloso...@apache.org >>>>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>>> >>>>>>> >>>>>>> More to it, it is actually not only about FQL. Audit logging is on >>>>>>> Chronicle queues too so inspecting that would be platform independent >>>>>>> as well. >>>>>>> >>>>>>> CEP-12 suggests that there might be a persistent store for diagnostic >>>>>>> events as well. If somebody wants to inspect what a node was doing >>>>>>> after it went offline as for now all these events are in memory only. >>>>>>> >>>>>>> This would basically enable people to fully inspect what the cluster >>>>>>> was doing from FQL to Audit to Diagnostics in a language independent >>>>>>> manner. >>>>>>> >>>>>>> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič >>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>>> I think the biggest selling point for using something like protobuf is >>>>>>>> what David said - what if he wants to replay it in Go? Basing it on >>>>>>>> something language neutral enables people to replay it in whatever >>>>>>>> they want. If we have something totally custom then it is replayable >>>>>>>> just in Java without bringing tons of dependencies to their projects. >>>>>>>> That is the message I got from what he wrote. >>>>>>>> >>>>>>>> On Thu, Sep 19, 2024 at 9:47 PM Benedict <bened...@apache.org >>>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>>>> Do we need any of these things either? We have our own serialisation >>>>>>>>> framework and file readers and writers, and at least in the FQL case >>>>>>>>> these are the native serialisation format. >>>>>>>>> >>>>>>>>> At cursory glance it also looks to me like this would be a minimal >>>>>>>>> refactor from the current state. >>>>>>>>> >>>>>>>>> What is the reason we want to add these other dependencies? >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 19 Sep 2024, at 20:31, Štefan Miklošovič <smikloso...@apache.org >>>>>>>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> well the Maven plugin declares that it downloads protoc from Maven >>>>>>>>>> Central automatically _somehow_ so coding up an Ant task which does >>>>>>>>>> something similar shouldn't be too hard. I will investigate this >>>>>>>>>> idea. >>>>>>>>>> >>>>>>>>>> On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams <dri...@gmail.com >>>>>>>>>> <mailto:dri...@gmail.com>> wrote: >>>>>>>>>>> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič >>>>>>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>>>>>> > Unfortunately there is nothing like that for Ant, protoc would >>>>>>>>>>> > need to be a local dependency on the computer which compiles the >>>>>>>>>>> > project to be able to do that so that is kind of a dead end. Or >>>>>>>>>>> > is there any workaround here? >>>>>>>>>>> >>>>>>>>>>> In the old thrift days I believe we generated the code and checked >>>>>>>>>>> it >>>>>>>>>>> in so you didn't need to compile locally. >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> Brandon >>