Hi Philipp, yes, I think it makes sense to have a single service for handling files. When writing the CSVMetadataEnrichment component for Chris, I started to add a simple file management to the backend and also extended the SDK with methods to receive files from the backend (see CsvMetadataEnrichmentController and FileServingResource in the backend).
We could extend this, isolate the file management to an individual microservice and add a simple API in front of it that can be used by all services that require to store or receive files (e.g., also for the included assets of pipeline elements, which could be documentation, icons or ML models). Concerning HDFS, in my opinion this might be an option, but as we don't have very large amounts of data by now to store, it would probably be a bit of overkill here (one distributed system more to manage). Dominik -----Original Message----- From: Philipp Zehnder <[email protected]> Sent: Tuesday, February 18, 2020 6:28 PM To: [email protected] Subject: STREAMPIPES-75: Extend data lake sink to store images Hi all, I finished the implementation to store images in files instead of base 64 Strings in InfluxDB. For the first version I mounted a local volume and added the images in a folder in this volume. I think this is a good starting point because the images are stored in a local volume on the same host as the sink. Now the question is how can users access those images? I would suggest to extend the data lake REST API for that. Therefore, the backend must mount the same volume as the internal sink container with the data lake sink. Does anyone of you have an alternative solution? @Dominik, you implemented already an StreamPipes internal file storage. Could we use that for the images as well or would the frequency be too high? @all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a shared service between multiple containers Philipp
