Hey Noam,

I think this would certainly be useful, and thank you for your interest in
contributing!

I think the toughest part will be designing a good API (meaning: what would
users specify in the kafka supervisor json spec in order to activate and
configure this feature?). So a good way to proceed would be to propose some
API, gather some community feedback on the design of the API, and then
start working on a patch.

Some thoughts on API design:

1) https://github.com/apache/druid/pull/10730 adds some related
functionality that you would want to hook into. This patch added Java APIs
that can be used in extensions, but didn't add any JSON APIs that can be
used by regular users. But you could build some JSON APIs on top of this.

2) Some keys are "formatted" (like the examples you gave: json and
delimited). Formatted keys should be parsed and fields extracted from them
somehow, using their own InputFormat. Maybe we should call it the
"keyInputFormat". We need to figure out what semantics make the most sense
for presenting the parsed key to later stages of the system (which expect a
single namespace). Merging the parsed key map with the parsed value map
seems like a bad idea, since there might be field name collisions. So maybe
we should prefix them with some string like "__key.". There could still be
collisions, but they'd be less likely if we choose an uncommon prefix. At
some point, we may also need to let users specify their own prefix, or even
something fancier like an explicit mapping. But I think we won't need that
feature on day 1.

3) There are also unformatted keys that might be simple strings or byte
arrays. These unformatted keys should become a single field. I’m not sure
which is more prevalent, or which one we should build first, but I think
ultimately we’ll want to support both styles.

On Fri, Apr 16, 2021 at 3:36 PM noam shaish <noamsha...@gmail.com> wrote:

> Hi,
> I would like to try and add a InputFormat for Kafka to support also fields
> coming from the event key.
> In my scenario there are to options:
> 1. both key and value are json
> 2. key is delimited string and the value is json.
>
> Would such a feature will be welcome for contribution? or should I keep on
> my own fork?
>
> Thanks,
> Noam
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>

Reply via email to