Well, that looks like item number one to fix when we change the serialisation 
format. We should clearly not duplicate query strings we have recently logged.

We do however appear to also serialise the bind variables, which benefit from 
being in the format we already have available in memory.

> On 19 Sep 2024, at 22:26, Štefan Miklošovič <smikloso...@apache.org> wrote:
> 
> I am not sure what you mean. I mean, I do, but not following. Look into 
> FullQueryLogger (1) what it goes to put into CQL is a query like String 
> wrapped in a Query object. It literally take a String as a representation of 
> a query a user executed. We just replace this by serializing that query to 
> protobuf. What is counter productive? We just replace one thing for another. 
> Audit message / events would be similar. 
> 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321
> 
> On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan <jeremiah.jor...@gmail.com 
> <mailto:jeremiah.jor...@gmail.com>> wrote:
>> I think reserializing the payload into a new format is counter productive to 
>> some of the performance goals of the binary logs?
>> If you have to deserialize and reserialize the message you are going to be 
>> throwing off a ton of extra GC.
>> I think we have done a lot of work in recent version to reduce the amount of 
>> re-serialization that happens in the query paths?  Not sure we want to add 
>> some back in on purpose?  Keeping the payload in the internal serialization 
>> format does indeed have the drawbacks David mentioned, but I think “zero 
>> serialization overhead” is a pretty big advantage to keeping things that way?
>> 
>> -Jeremiah
>> 
>>> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič <smikloso...@apache.org 
>>> <mailto:smikloso...@apache.org>> wrote:
>>> 
>>> 
>>> I think protobuf upon serialization is just a bunch of bytes anyway. If we 
>>> figure out the header as David showed then we can still serialize it all 
>>> with the machinery / serializers you mentioned. It can write bytes, right?! 
>>> I very briefly checked and I think that protobuf is super simple and does 
>>> not have any checksumming etc. so some sauce on top of that would be 
>>> necessary anyway and we can reuse what we have to produce binary files.
>>> 
>>> On the consumer side, the binary file would be parsed with some tooling 
>>> e.g. in Go, indeed, but the headers and stuff would be so simple that it 
>>> would be just a coding exercise and then it might be deserialized with 
>>> protobuf for that language.
>>> 
>>> Basically, only the payload itself would be the product of protobuf and all 
>>> around super simple to crack through.
>>> 
>>> On Thu, Sep 19, 2024 at 10:41 PM Benedict <bened...@apache.org 
>>> <mailto:bened...@apache.org>> wrote:
>>>> Sorry, I missed that. I’m not convinced any of these logs need language 
>>>> agnostic tools for access, but if that’s a goal for other folk I don’t 
>>>> feel strongly about it.
>>>> 
>>>>> On 19 Sep 2024, at 21:06, Štefan Miklošovič <smikloso...@apache.org 
>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>> 
>>>>> 
>>>>> More to it, it is actually not only about FQL. Audit logging is on 
>>>>> Chronicle queues too so inspecting that would be platform independent as 
>>>>> well. 
>>>>> 
>>>>> CEP-12 suggests that there might be a persistent store for diagnostic 
>>>>> events as well. If somebody wants to inspect what a node was doing after 
>>>>> it went offline as for now all these events are in memory only.
>>>>> 
>>>>> This would basically enable people to fully inspect what the cluster was 
>>>>> doing from FQL to Audit to Diagnostics in a language independent manner. 
>>>>> 
>>>>> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič <smikloso...@apache.org 
>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>>> I think the biggest selling point for using something like protobuf is 
>>>>>> what David said - what if he wants to replay it in Go? Basing it on 
>>>>>> something language neutral enables people to replay it in whatever they 
>>>>>> want. If we have something totally custom then it is replayable just in 
>>>>>> Java without bringing tons of dependencies to their projects. That is 
>>>>>> the message I got from what he wrote. 
>>>>>> 
>>>>>> On Thu, Sep 19, 2024 at 9:47 PM Benedict <bened...@apache.org 
>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>> Do we need any of these things either? We have our own serialisation 
>>>>>>> framework and file readers and writers, and at least in the FQL case 
>>>>>>> these are the native serialisation format. 
>>>>>>> 
>>>>>>> At cursory glance it also looks to me like this would be a minimal 
>>>>>>> refactor from the current state.
>>>>>>> 
>>>>>>> What is the reason we want to add these other dependencies?
>>>>>>> 
>>>>>>> 
>>>>>>>> On 19 Sep 2024, at 20:31, Štefan Miklošovič <smikloso...@apache.org 
>>>>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> well the Maven plugin declares that it downloads protoc from Maven 
>>>>>>>> Central automatically _somehow_ so coding up an Ant task which does 
>>>>>>>> something similar shouldn't be too hard. I will investigate this idea. 
>>>>>>>> 
>>>>>>>> On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams <dri...@gmail.com 
>>>>>>>> <mailto:dri...@gmail.com>> wrote:
>>>>>>>>> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
>>>>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote:
>>>>>>>>> > Unfortunately there is nothing like that for Ant, protoc would need 
>>>>>>>>> > to be a local dependency on the computer which compiles the project 
>>>>>>>>> > to be able to do that so that is kind of a dead end. Or is there 
>>>>>>>>> > any workaround here?
>>>>>>>>> 
>>>>>>>>> In the old thrift days I believe we generated the code and checked it
>>>>>>>>> in so you didn't need to compile locally.
>>>>>>>>> 
>>>>>>>>> Kind Regards,
>>>>>>>>> Brandon

Reply via email to