> Dominik, wasn't the original idea for VAST to provide an event
> description language that would create the link between the values
> coming over the wire and their interpretation? Such a specification
> could be auto-generated from Bro's knowledge about the events it
> generates.
We were actually thinking about auto-generating the schema. But broker::data
simply has no meta information that we can use. Even distinguishing
records/tuples from actual lists is impossible, because broker::vector is used
for both. Of course we can make a couple of assumptions (the top-level vector
is a record, for example), but then VAST users only ever can use type queries.
In other words, they can only ask for IP addresses for example, but not
specifically for originator IPs.
In a sense, broker’s representation is an inverted JSON. In JSON, we have field
names but no type information (everything is a string), whereas in broker we
have (ambiguous) type information but no field names. :)
>> Though the Broker data corresponding to log entry content is also
>> opaque at the moment (I recall that was maybe for performance or
>> message volume optimization),
>
> Yeah, but generally this is something I could see opening up. The log
> structure is pretty straight-forward and self-describing, it'd be
> mostly a matter of clean up and documentation to make that directly
> accessible to external consumers I think. Events, on the other hands,
> are semantically tied very closely to the scripts generating them, and
> also much more diverse so that self-description doesn't really seem
> feasible/useful. Republishing a relevant subset certainly sounds
> better for that; or, if it's really a bulk feed that's desired, some
> out-of-band mechanism to convey the schema information somehow.
Opening that up would be great.
However, our goal was to have Broker as a source for structured data that we
can import in a generic fashion for later analysis. Of course that relies on a
standard / convention / best practice for making schema programmatically
accessible. Currently, it seems that we need a schema definition provided by
the user offline. This will work as long as all published data for a given
topic is uniform. Multiplexing multiple event types already makes things
complicated, but it seems like this is actually the standard use case. OSQuery,
for example, will generate different events that we than either need to
separate into different topics or multiplex in a single topic but merge-in some
meta information. And once we mix in meta information with actual data, a
simple schema definition no longer cuts it. At worst, importing data from
Broker requires a separate parser for each import format.
> broker/bro.hh is basically all there is right now
I’m a bit hesitant to rely on this header at the moment, because of:
/// A Bro log-write message. Note that at the moment this should be used only
/// by Bro itself as the arguments aren't publicly defined.
Is the API stable enough on your end at this point to make it public? Also,
there are LogCreate and LogWrite events. The LogCreate has the `fields_data` (a
list of field names?). Does that mean I need to receive the LogCreate even
first to understand successive LogWrite events? That would mean I cannot parse
logs that had their LogCreate event before I was able to subscribe to the topic.
Dominik
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev