I agree, even if we don’t manage the optimal zero conversion. 

I am also not entirely convinced we need to worry about compatibility for FQL and other logs - we can just say you must use the version of C* tools you produced the log with - I would even be fine with saying this isn’t even guaranteed to be compatible across minor versions (at least for FQL, perhaps not audit logging. But we can and should be explicit about our guarantees each file we produce).

Most of this is not meant to be baked into production workflows, and if they are they can use command line tools bundled with C*. Programmatic consumers are primarily going to be us here on this list. Let’s not burden ourselves unnecessarily.

On 19 Sep 2024, at 22:17, J. D. Jordan <jeremiah.jor...@gmail.com> wrote:


I think reserializing the payload into a new format is counter productive to some of the performance goals of the binary logs?
If you have to deserialize and reserialize the message you are going to be throwing off a ton of extra GC.
I think we have done a lot of work in recent version to reduce the amount of re-serialization that happens in the query paths?  Not sure we want to add some back in on purpose?  Keeping the payload in the internal serialization format does indeed have the drawbacks David mentioned, but I think “zero serialization overhead” is a pretty big advantage to keeping things that way?

-Jeremiah

On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič <smikloso...@apache.org> wrote:


I think protobuf upon serialization is just a bunch of bytes anyway. If we figure out the header as David showed then we can still serialize it all with the machinery / serializers you mentioned. It can write bytes, right?! I very briefly checked and I think that protobuf is super simple and does not have any checksumming etc. so some sauce on top of that would be necessary anyway and we can reuse what we have to produce binary files.

On the consumer side, the binary file would be parsed with some tooling e.g. in Go, indeed, but the headers and stuff would be so simple that it would be just a coding exercise and then it might be deserialized with protobuf for that language.

Basically, only the payload itself would be the product of protobuf and all around super simple to crack through.

On Thu, Sep 19, 2024 at 10:41 PM Benedict <bened...@apache.org> wrote:
Sorry, I missed that. I’m not convinced any of these logs need language agnostic tools for access, but if that’s a goal for other folk I don’t feel strongly about it.

On 19 Sep 2024, at 21:06, Štefan Miklošovič <smikloso...@apache.org> wrote:


More to it, it is actually not only about FQL. Audit logging is on Chronicle queues too so inspecting that would be platform independent as well. 

CEP-12 suggests that there might be a persistent store for diagnostic events as well. If somebody wants to inspect what a node was doing after it went offline as for now all these events are in memory only.

This would basically enable people to fully inspect what the cluster was doing from FQL to Audit to Diagnostics in a language independent manner. 

On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič <smikloso...@apache.org> wrote:
I think the biggest selling point for using something like protobuf is what David said - what if he wants to replay it in Go? Basing it on something language neutral enables people to replay it in whatever they want. If we have something totally custom then it is replayable just in Java without bringing tons of dependencies to their projects. That is the message I got from what he wrote. 

On Thu, Sep 19, 2024 at 9:47 PM Benedict <bened...@apache.org> wrote:
Do we need any of these things either? We have our own serialisation framework and file readers and writers, and at least in the FQL case these are the native serialisation format. 

At cursory glance it also looks to me like this would be a minimal refactor from the current state.

What is the reason we want to add these other dependencies?


On 19 Sep 2024, at 20:31, Štefan Miklošovič <smikloso...@apache.org> wrote:


well the Maven plugin declares that it downloads protoc from Maven Central automatically _somehow_ so coding up an Ant task which does something similar shouldn't be too hard. I will investigate this idea. 

On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams <dri...@gmail.com> wrote:
On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
<smikloso...@apache.org> wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a local dependency on the computer which compiles the project to be able to do that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Reply via email to