Single machine? Any other framework will perform better than Spark On Tue, 19 Jun 2018 at 09:40, Aakash Basu <aakash.spark....@gmail.com> wrote:
> Georg, just asking, can Pandas handle such a big dataset? If that data is > further passed into using any of the sklearn modules? > > On Tue, Jun 19, 2018 at 10:35 AM, Georg Heiler <georg.kf.hei...@gmail.com> > wrote: > >> use pandas or dask >> >> If you do want to use spark store the dataset as parquet / orc. And then >> continue to perform analytical queries on that dataset. >> >> Raymond Xie <xie3208...@gmail.com> schrieb am Di., 19. Juni 2018 um >> 04:29 Uhr: >> >>> I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment >>> is 20GB ssd harddisk and 2GB RAM. >>> >>> The dataset comes with >>> User ID: 987,994 >>> Item ID: 4,162,024 >>> Category ID: 9,439 >>> Behavior type ('pv', 'buy', 'cart', 'fav') >>> Unix Timestamp: span between November 25 to December 03, 2017 >>> >>> I would like to hear any suggestion from you on how should I process the >>> dataset with my current environment. >>> >>> Thank you. >>> >>> *------------------------------------------------* >>> *Sincerely yours,* >>> >>> >>> *Raymond* >>> >> >