Hi Philipp,

yes, I think it makes sense to have a single service for handling files.
When writing the CSVMetadataEnrichment component for Chris, I started to add a 
simple file management to the backend and also extended the SDK with methods to 
receive files from the backend (see CsvMetadataEnrichmentController and 
FileServingResource in the backend).

We could extend this, isolate the file management to an individual microservice 
and add a simple API in front of it that can be used by all services that 
require to store or receive files (e.g., also for the included assets of 
pipeline elements, which could be documentation, icons or ML models).

Concerning HDFS, in my opinion this might be an option, but as we don't have 
very large amounts of data by now to store, it would probably be a bit of 
overkill here (one distributed system more to manage). 

Dominik

-----Original Message-----
From: Philipp Zehnder <[email protected]> 
Sent: Tuesday, February 18, 2020 6:28 PM
To: [email protected]
Subject: STREAMPIPES-75: Extend data lake sink to store images

Hi all,

I finished the implementation to store images in files instead of base 64 
Strings in InfluxDB.

For the first version I mounted a local volume and added the images in a folder 
in this volume. 
I think this is a good starting point because the images are stored in a local 
volume on the same host as the sink.
Now the question is how can users access those images? I would suggest to 
extend the data lake REST API for that.
Therefore, the backend must mount the same volume as the internal sink 
container with the data lake sink.

Does anyone of you have an alternative solution?

@Dominik, you implemented already an StreamPipes internal file storage. Could 
we use that for the images as well or would the frequency be too high?

@all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a 
shared service between multiple containers


Philipp

Reply via email to