Yeah, it depends on what you want to do with that timeseries data. We at Datadog process trillions of points daily using Spark, I cannot really go about what exactly we do with the data, but just saying that Spark can handle the volume, scale well and be fault-tolerant, albeit everything I said comes with multiple asterisks.
On Thursday, May 24, 2018, amin mohebbi <aminn_...@yahoo.com.invalid> wrote: > Could you please help me to understand the performance that we get from > using spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings > = 288 mil rows (Approx. 360 GB per day) – Therefore, we will end up with > 10's or 100's of TBs of data and I feel that NoSQL will be much quicker > than Hadoop/Spark. This is time series data that are coming from many > devices in form of flat files and it is currently extracted / transformed > /loaded > into another database which is connected to BI tools. We might use azure > data factory to collect the flat files and then use spark to do the ETL(not > sure if it is correct way) and then use spark to join table or do the > aggregations and save them into a db (preferably nosql not sure). > Finally, connect deploy Power BI to get visualize the data from nosql db. > My questions are : > > 1- Is the above mentioned correct architecture? using spark with nosql as > I think combination of these two could help to have random access and run > many queries by different users. > 2- do we really need to use a time series db? > > > Best Regards ....................................................... Amin > Mohebbi PhD candidate in Software Engineering at university of Malaysia > Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my > amin_...@me.com > -- Sent from my iPhone