Padova Apache Spark Meetup

2018-09-05 Thread Matteo Durighetto
events in this mailing list ? Kind Regards Matteo Durighetto e-mail: m.durighe...@miriade.it supporto kandula : database : support...@miriade.it business intelligence : support...@miriade.it infrastructure : supp...@miriade.it M I R I A D E - P L A Y T H E C H A N G E Via Castelletto 11, 36016

Re: Writing a DataFrame is taking too long and huge space

2018-03-09 Thread Matteo Durighetto
Hello, try to use parquet format with compression ( like snappy or lz4 ) so the produced files will be smaller and it will generate less i/o. Moreover normally parquet is more faster than csv format in reading for further operations . Another possible format is ORC file. Kind Regards Matteo