Hi,

You are right, Griffin can persist metrics to different sinks like ES and HDFS, 
with the missing records in HDFS in accuracy measurements. The storage 
requirement depends on your data size, metrics are always small, the missing 
records might be large if the accuracy is not good, up to the data source, if 
all the data are mismatched.
I agree with William that, normal metrics will not take too much storage, 
metrics in HDFS could also be optional, and the memory resource of spark 
cluster just depends on your data size, in our case, we could use 10 workers 
with 8G memories to calculate the accuracy metric for 800M lines of data in 3 
minutes.
Storage is not the strict resource for Griffin, so HDFS is not your limit, but 
a larger spark cluster can accelerate the performance.


Thanks,
Lionel


On 07/17/2019 23:13, jose.martin_santacruz.ext wrote:
Hello William,

OK, but which would be the minimum storage and the recommended storage for the 
cluster node where Apache Griffin is running?
The metrics are always stored in elastic?, in the documentation I have seen 
that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, 
...).

Waiting for your answer

Regards

-----Mensaje original-----
De: William Guo <[email protected]>
Enviado el: miƩrcoles, 17 de julio de 2019 17:01
Para: [email protected]
Asunto: Re: Apache Griffin storage requirements

hi,

There are no special storage requirements for griffin, the storage depends on 
your spark jobs and scale of your dataset.
We only temporarily store some intermediate cache in spark and store the 
metrics in elastic. metrics should be small.


Thanks,
William


On Wed, Jul 17, 2019 at 10:54 PM <
[email protected]> wrote:

> Hello,
>
> We are starting a new Project with Apache Griffin and we need to know
> which are the storage requirements for Griffin and we have found no
> documentation about it, can you give us this information?
>
> Waiting for your answer.
>
> Regards
>
>

Reply via email to