Hi
Thanks for the answer. I think my simulator includes a lot of parallel state machines and each of them generates log file (with timestamps). Finally all events (rows) of all log files should combine as time order to (one) very huge log file. Practically the combined huge log file can also be split into smaller ones. What transformation or action functions can i use in Spark for that purpose ? Or are there exist some code sample (Python or Scala) about that ? Regards Esa Heikkinen ________________________________ Lähettäjä: Jörn Franke <jornfra...@gmail.com> Lähetetty: 20. kesäkuuta 2017 17:12 Vastaanottaja: Esa Heikkinen Kopio: user@spark.apache.org Aihe: Re: Using Spark as a simulator It is fine, but you have to design it that generated rows are written in large blocks for optimal performance. The most tricky part with data generation is the conceptual part, such as probabilistic distribution etc You have to check as well that you use a good random generator, for some cases the Java internal might be not that well. On 20. Jun 2017, at 16:04, Esa Heikkinen <esa.heikki...@student.tut.fi<mailto:esa.heikki...@student.tut.fi>> wrote: Hi Spark is a data analyzer, but would it be possible to use Spark as a data generator or simulator ? My simulation can be very huge and i think a parallelized simulation using by Spark (cloud) could work. Is that good or bad idea ? Regards Esa Heikkinen