[GitHub] [camel-k] nicolaferraro commented on issue #1980: Add support for multiple data types and schemas in Kamelets

GitBox Thu, 04 Feb 2021 01:09:45 -0800


nicolaferraro commented on issue #1980:
URL: https://github.com/apache/camel-k/issues/1980#issuecomment-773151586



   Let's do another iteration on this...
   
   I'm thinking to your comments and I like the idea of having stuff also as 
CRs. I remember some brainstorming with @lburgazzoli about how dynamic schemas 
may work in this model. The idea was to let Kamelets define their schemas, if 
known in advance, but also let KameletBindings redefine them, if needed.
   
   DataFormats are generic in Camel, but when talking about connectors (a.k.a. 
Kamelets), I think it's better for the Kamelet to enumerate all the possible 
dataformats it supports. E.g. @davsclaus was talking about sources that can 
only produce `binary` data (i.e. no dataformat), but there are many other 
examples: e.g. a "hello world" string cannot be transformed into FHIR data by 
simply plugging the FHIR JSON dataformat, as well as not all data is suitable 
for CSV encoding..
   
   I also see that we're talking about formats and schemas as if they were the 
same thing, but even if they are related (i.e. dataFormat + Kamelet [+ Binding 
Properties] may imply a Schema), maybe we can do a better job in treating them 
as separate entities.
   
   I think the following model may be good for the in-Kamelet specification of 
a "format":
   
   ```yaml
   kind: Kamelet
   apiVersion: camel.apache.org/v1alpha1
   metadata:
     name: chuck-source
   # ... 
   spec:
     definition:
       properties:
         format:
           title: Format
           type: string
           enum:
           - JSON
           - Avro
           default: JSON
   # ... 
   formats:
   - name: JSON
     # optional, useful in case of in/out Kamelets
     scope: out
     schema:
       mediaType: "application/json"
       data: # the JSON schema inline
       url: # alternative link to the shema
       ref: # alternative Kubernetes reference to the schema (see below)
         name: # ...
     # the source produces JSON by default, no libs or transformations needed
   
   - name: Avro
     schema:
       type: avro-schema
       mediaType: "application/avro"
       data: # the avro schema inline
       url: # alternative link to the schema
       ref: # alternative Kubernetes reference to the schema (see below)
         name: # ...
     dataFormat:
       # optional, but if not provided "no format" is assumed
       id: "avro"
       properties: # only if "id" is present
         class-name: org.apache.camel.xxx.MyClass
         compute-schema: true|false
         # ...
       dependencies:
       - camel:jackson
       - camel:avro
       - mvn:org.acme/my-artifact/1.0.0
   
   ```
   
   You can notice the `scope` property that allows to define the specific 
details of transformations for input and output of a particular format. I'd not 
complicate life and assume that users will choose only 1 format using the 
standard `format` property (not an `inputFormat` and `outputFormat`). So if I 
choose `CSV`, the Kamelet will consume and produce CSV. Anyway, the shape 
(schema) of the input CSV can be different from the one of the output CSV (and 
that's described in the Kamelet).
   
   The `schema` here is declared inline in the Kamelet, to make it 
self-contained, but we can create also a `Schema` CR:
   
   ```yaml
   kind: Schema
   apiVersion: camel.apache.org/v1alpha1
   metadata:
     name: my-avro-schema
   spec:
     type: avro-schema
     mediaType: application/avro
     data: # the avro schema inline
     url: # alternative URL reference
     # no, ref is forbidden here
   ```
   
   Structure is almost the same as the inline version.
   
   The binding can use the predefined schema:
   
   ```yaml
   kind: KameletBinding
   apiVersion: camel.apache.org/v1alpha1
   metadata:
     name: chuck-to-channel
   spec:
     source:
       kind: Kamelet
       apiVersion: camel.apache.org/v1alpha1
       name: chuck-source
       properties:
         # may have been omitted, since it's the default
         format: JSON
     sink:
       # ...
   ```
   
   The binding above will produce objects in JSON format with the inline 
definition of the schema. The one below is using a custom schema:
   
   ```yaml
   kind: KameletBinding
   apiVersion: camel.apache.org/v1alpha1
   metadata:
     name: chuck-to-channel
   spec:
     source:
       kind: Kamelet
       apiVersion: camel.apache.org/v1alpha1
       name: chuck-source
       properties:
         # since there's no inline format named "my-avro", it refers to the 
external one
         format: Avro
       schema:
         # since it's a source, we assume this is the schema of the output
         ref:
           name: my-avro-schema
         # or alternatively also inline
         data: #...
         url: # ...
     sink:
       # ...
   ```
   
   This mechanism may be used also in cases where the schema can be computed 
dynamically before running the integration. In this case, an external entity 
saves the schema in a CR and references it in the KameletBinding.
   
   For the use case of using the Schema CR to sync external entities (like 
registries), it's possible, but we should think more about that because of edge 
cases: sometimes the schema is known only at runtime and sometimes it varies 
from message to message. In that cases, it's the integration itself that needs 
to update the registries. Probably it would be cleaner if it's the integration 
that always updates the registry.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [camel-k] nicolaferraro commented on issue #1980: Add support for multiple data types and schemas in Kamelets

Reply via email to