[I] Harmonize data set and data stream API (streampipes)

via GitHub Tue, 14 Feb 2023 22:54:08 -0800


tenthe opened a new issue, #1289:
URL: https://github.com/apache/streampipes/issues/1289


   ### Discussed in https://github.com/apache/streampipes/discussions/1115
   
   <div type='discussions-op-text'>
   
   <sup>Originally posted by **tenthe** January 17, 2023</sup>
   
   
   ## Harmonize `data set` and `data stream` APIs
   
   We are currently looking at the Connect API and plan to refactor parts of 
it. Looking at the current implementation, I noticed that we have several cases 
that make the implementation more complex.
   
   ## Distinction between `data set` and `data stream` adapters
   
   For example, we distinguish between `data set` and `data stream` adapters. 
Set adapters are treated as bounded streams, i.e. they stream a data set only 
once. Originally, this was added because it allows the user to replay existing 
events (e.g., from databases or files). However, I don't think this feature is 
used very often and we only have three implementations of set adapters. This 
feature adds a lot of overhead in many different places, such as the UI, the 
core, and extension services.
   
   ## Main features of current data sets
   
   The main features of the data sets we currently use are:
   - For the e2e tests to validate the processing elements
   - Import a data set (e.g. CSV file) into the time-series storage
   
   I think these are important and we should definitely keep them, but maybe we 
can find another solution to accomplish these tasks.
   
   ## Alternative solutions
   
   New functionality:
   - Add option to create adapters without starting them
   - Add an option to the `FileStreamAdapter` to play the file only once
   
   To import a dataset, a user (or the e2e tests) would need to create an 
adapter without starting it, create the pipeline, and then start the adapter.
   
   ## Recommendation
   
   Since we don't have many benefits of the data set API, I would recommend 
removing it. This would also provide a clearer focus for StreamPipes because it 
focuses on streaming data produced by machines. Further, it will ease the 
implementation in many places without drawbacks in terms of functionality.
   How do you see it?
   
   
   PS: I would also like to harmonize the model for `GenericAdapters` and 
`SpecificAdapters`, but that is another discussion ;).
   
   Cheers,
   Philipp</div>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Harmonize data set and data stream API (streampipes)

Reply via email to