Yeah, it depends on what you want to do with that timeseries data. We at
Datadog process trillions of points daily using Spark, I cannot really go
about what exactly we do with the data, but just saying that Spark can
handle the volume, scale well and be fault-tolerant, albeit everything I
said comes with multiple asterisks.

On Thursday, May 24, 2018, amin mohebbi <aminn_...@yahoo.com.invalid> wrote:

> Could you please help me to understand  the performance that we get from
> using spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings
> = 288 mil rows (Approx. 360 GB per day) – Therefore, we will end up with
> 10's or 100's of TBs of data and I feel that NoSQL will be much quicker
> than Hadoop/Spark. This is time series data that are coming from many
> devices in form of flat files and it is currently extracted / transformed 
> /loaded
> into another database which is connected to BI tools. We might use azure
> data factory to collect the flat files and then use spark to do the ETL(not
> sure if it is correct way) and then use spark to join table or do the
> aggregations and save them into a db (preferably nosql not sure).
> Finally, connect deploy Power BI to get visualize the data from nosql db.
> My questions are :
>
> 1- Is the above mentioned correct architecture? using spark with nosql as
> I think combination of these two could help to have random access and run
> many queries by different users.
> 2- do we really need to use a time series db?
>
>
> Best Regards ....................................................... Amin
> Mohebbi PhD candidate in Software Engineering   at university of Malaysia
> Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my
> amin_...@me.com
>


-- 
Sent from my iPhone

Reply via email to