Hi,
You are right, Griffin can persist metrics to different sinks like ES and HDFS, with the missing records in HDFS in accuracy measurements. The storage requirement depends on your data size, metrics are always small, the missing records might be large if the accuracy is not good, up to the data source, if all the data are mismatched. I agree with William that, normal metrics will not take too much storage, metrics in HDFS could also be optional, and the memory resource of spark cluster just depends on your data size, in our case, we could use 10 workers with 8G memories to calculate the accuracy metric for 800M lines of data in 3 minutes. Storage is not the strict resource for Griffin, so HDFS is not your limit, but a larger spark cluster can accelerate the performance. Thanks, Lionel On 07/17/2019 23:13, jose.martin_santacruz.ext wrote: Hello William, OK, but which would be the minimum storage and the recommended storage for the cluster node where Apache Griffin is running? The metrics are always stored in elastic?, in the documentation I have seen that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, ...). Waiting for your answer Regards -----Mensaje original----- De: William Guo <[email protected]> Enviado el: miƩrcoles, 17 de julio de 2019 17:01 Para: [email protected] Asunto: Re: Apache Griffin storage requirements hi, There are no special storage requirements for griffin, the storage depends on your spark jobs and scale of your dataset. We only temporarily store some intermediate cache in spark and store the metrics in elastic. metrics should be small. Thanks, William On Wed, Jul 17, 2019 at 10:54 PM < [email protected]> wrote: > Hello, > > We are starting a new Project with Apache Griffin and we need to know > which are the storage requirements for Griffin and we have found no > documentation about it, can you give us this information? > > Waiting for your answer. > > Regards > >
