Re: DataSink for Redis

Philipp Zehnder Tue, 12 May 2020 21:34:23 -0700

Hi Grainer,

thank you! I direclty merged the pull request with the docker-compose file.


@Patrick, what else do we have to add when we want to use Redit in Kubernetes?
Do we also have to add a template in [1] as well or is it sufficient to have 
the docker-compose file?

Philipp

[1] 
https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services
 
<https://github.com/apache/incubator-streampipes-installer/tree/dev/helm-chart/templates/optional-external-services>

On 2020/05/13 03:01:37, Grainier Perera <[email protected]> wrote: 
> Hi Philipp,
> 
> I've created an issue [1] and added a docker-compose file for Redis in
> PR[2]. Please review and merge.
> 
> [1] https://issues.apache.org/jira/browse/STREAMPIPES-124
> [2] https://github.com/apache/incubator-streampipes-installer/pull/6
> 
> Thanks,
> Grainier.
> 
> On Wed, 13 May 2020 at 02:01, Philipp Zehnder <[email protected]> wrote:
> 
> > Hi Grainer,
> >
> > your PR looks very good.
> > Do you have a docker-compose file for Redis?
> > I would like to add it to our CLI [1] in the service directory.
> >
> > This makes it easy for StreamPipes users to setup an instance and use your
> > new sink.
> > A user just has to add ‘redis’ to the system file and the container is
> > then started with the rest of the system.
> > We already provided docker-compose files for other DBs.
> >
> > Philipp
> >
> > [1] https://github.com/apache/incubator-streampipes-installer/tree/dev/cli
> > <https://github.com/apache/incubator-streampipes-installer/tree/dev/cli>
> >
> > > On 12. May 2020, at 18:09, Grainier Perera <[email protected]>
> > wrote:
> > >
> > > Hi Philipp,
> > >
> > > I agree with your opinion on the key-field. So I've modified it with an
> > > option to either use auto-increment or use an existing event field as the
> > > key field [1]. Now it will have a radio button to select True/False on
> > > auto-increment. And if it's True, key-field will be ignored and a
> > > sequential numeric key will be used. Otherwise, it'll use the selected
> > > field as the key field.
> > >
> > > When it comes to use-cases, a user can;
> > >
> > >   1. Store the last event per asset (asset id as the key-field,
> > >   auto-increment disabled, index -1).
> > >   2. Collect all the events for per asset for diagnostics, replaying,
> > >   etc... (auto-increment enabled, different index per asset) (index is
> > like a
> > >   separate DB with a distinct keyspace, independent from the others [2])
> > >   3. To collect recent events with data purging. (similar to 1, 2. But,
> > >   with an expiration time).
> > >
> > > So, with this new approach, it would allow all the above scenarios. What
> > do
> > > you think?
> > >
> > > [1] https://github.com/apache/incubator-streampipes-extensions/pull/13
> > > [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
> > >
> > > Regards,
> > > Grainier.
> > >
> > > On Tue, 12 May 2020 at 12:36, Philipp Zehnder <[email protected]>
> > wrote:
> > >
> > >> Hi Grainer,
> > >>
> > >> the sink looks very cool and I merged your PR.
> > >>
> > >> I have a question regarding the key field.
> > >>
> > >> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
> > >> requiredTextParameter.
> > >> When ‘-‘ is selected a unique counter is used for the key, right?
> > >> The problem is when a user selects a ‘runtimeName’ we can not provide
> > any
> > >> input validation.
> > >> If the primaryKey is not within the event the user will see an error
> > when
> > >> the pipeline is started and has to go back and edit the pipeline.
> > >>
> > >> Alternatively we could use a mapping property for the key field, then
> > the
> > >> user would see a drop down menu of all event properties and could select
> > >> one.
> > >> This way we can ensure that the key is within the event, but then we do
> > >> not have the chance to select ‘-‘.
> > >>
> > >> What do you think is a common use case for the Redit sink?
> > >> Could a use case for redit be to store the last event per asset? (e.g.
> > >> sensor or machine)
> > >> Therefore, we could use the mapping property solution and further extend
> > >> it with a dimension property requirement.
> > >> Then users can select a property representing an identifier (e.g.
> > machine
> > >> id. For each machine an entry would be created in Redit)
> > >>
> > >>
> > >> What do you think?
> > >>
> > >> Philipp
> > >>
> > >>
> > >>
> > >>> On 11. May 2020, at 17:51, Grainier Perera <[email protected]>
> > >> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I've sent PR [1] with the initial implementation. Please review and
> > >> merge.
> > >>>
> > >>> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
> > >>>
> > >>> Thanks,
> > >>> Grainier.
> > >>>
> > >>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <[email protected]>
> > wrote:
> > >>>
> > >>>> Hi Grainier,
> > >>>>
> > >>>> very cool! A Redis sink would be awesome.
> > >>>> Since I haven't worked a lot with Redis in the past, I don't have a
> > >> strong
> > >>>> opinion, just some thoughts:
> > >>>> I guess the answer depends on the question how users will use events
> > >>>> stored in Redis, whether they will need to access single fields or the
> > >>>> whole event. I'd probably guess that most users will access whole
> > >> events,
> > >>>> which would lead to option 1.
> > >>>> Maybe we could start with 1 and later on add an option in the pipeline
> > >>>> element configuration where users can switch between both options?
> > >>>>
> > >>>> I'll be happy to help you with the SDK in case you have any questions
> > -
> > >> I
> > >>>> know that our documentation has some potential for improvement, so
> > feel
> > >>>> free to ask 😉
> > >>>>
> > >>>> Dominik
> > >>>>
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Grainier Perera <[email protected]>
> > >>>> Sent: Sunday, May 10, 2020 6:20 PM
> > >>>> To: [email protected]
> > >>>> Subject: DataSink for Redis
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I'm planning to implement a data sink that forwards and store events
> > >> into
> > >>>> Redis[1][2]. But I'd like to get some feedback and opinion from you
> > >> before
> > >>>> proceeding.
> > >>>>
> > >>>> The question that I have is; since Redis is merely a key-value store,
> > >> and
> > >>>> we have a structured event to be persisted, what would the key-value
> > be?
> > >>>> Following are the possible approaches[3];
> > >>>>
> > >>>> 1. Store the entire object as a JSON-encoded string in a single key.
> > >>>>
> > >>>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
> > >>>>
> > >>>>
> > >>>>  - Pro: faster when accessing all the fields of the event at once.
> > >>>>  - Pro: works with nested objects (but I don't think we have any
> > nested
> > >>>>  objects).
> > >>>>  - Pro: can set the TTL.
> > >>>>  - Con: slower when accessing a single or subset of fields of the
> > >> event.
> > >>>>  - Con: JSON parsing is required to retrieve fields. However, it's
> > >> quite
> > >>>>  fast.
> > >>>>
> > >>>>
> > >>>> 2. Store each Object's properties in a Redis hash.
> > >>>>
> > >>>> * HMSET event:{id} sensorId "001"*
> > >>>>
> > >>>> * HMSET event:{id} temp "28"*
> > >>>>
> > >>>>
> > >>>>  - Pro: can set the TTL.
> > >>>>  - Pro: no need to parse JSON strings.
> > >>>>  - Con: faster when accessing a single or subset of fields of the
> > >> event.
> > >>>>  - Con: slower when accessing all the fields of the event.
> > >>>>
> > >>>>
> > >>>> 3. Store each Object as a JSON string in a Redis hash.
> > >>>>
> > >>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
> > >>>>
> > >>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
> > >>>>
> > >>>>
> > >>>>  - Pro: fewer keys to work with.
> > >>>>  - Con: can't set the TTL.
> > >>>>  - Con: JSON parsing is required to retrieve fields.
> > >>>>  - Con: slower when accessing a single or subset of fields of the
> > >> event.
> > >>>>
> > >>>>
> > >>>> 4. Store each property of each Object in a dedicated key.
> > >>>>
> > >>>> * SET event:{id}:sensorId "001"*
> > >>>>
> > >>>> * SET event:{id}:temp 28*
> > >>>>
> > >>>>
> > >>>>  - Pro: can set the TTL per field (but it's not necessary for our
> > >>>>  scenario).
> > >>>>  - Pro: no need to parse JSON strings.
> > >>>>  - Con: faster when accessing a single or subset of fields of the
> > >> event.
> > >>>>  - Con: slower when accessing all the fields of the event.
> > >>>>
> > >>>>
> > >>>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
> > >>>>
> > >>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
> > >>>>
> > >>>>
> > >>>>  - Pro: faster manipulation of JSON documents.
> > >>>>  - Pro: faster when accessing single/multiple fields of the event.
> > >>>>  - Pro: can set the TTL.
> > >>>>  - Con: requires RedisJSON module.
> > >>>>
> > >>>>
> > >>>> IMO, 1 & 2 would be the best choices given that they both allow (TTL)
> > >> for
> > >>>> purging. What would you think is best? Your feedback is highly
> > >> appreciated.
> > >>>>
> > >>>> [1] https://redis.io/
> > >>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
> > >>>> <https://redis.io/>
> > >>>> [3]
> > >>>>
> > >>>>
> > >>
> > https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
> > >>>> [4] https://redislabs.com/redis-enterprise/redis-json/
> > >>>> [5] https://oss.redislabs.com/redisjson/
> > >>>>
> > >>>> Regards,
> > >>>> Grainier.
> > >>>>
> > >>>>
> > >>
> > >>
> > >>
> >
> >
> >
>

Re: DataSink for Redis

Reply via email to