subject:"Why is shuffle write size so large when joining Dataset with nested structure\?"

Re: Why is shuffle write size so large when joining Dataset with nested structure?

2016-11-27 Thread Zhuo Tao

Hi Takeshi, Thank you for your comment. I changed it to RDD and it's a lot better. Zhuo On Fri, Nov 25, 2016 at 7:04 PM, Takeshi Yamamuro wrote: > Hi, > > I think this is just the overhead to represent nested elements as internal > rows on-runtime > (e.g., it consumes

Re: Why is shuffle write size so large when joining Dataset with nested structure?

2016-11-25 Thread Takeshi Yamamuro

-user-list. > 1001560.n3.nabble.com/Why-is-shuffle-write-size-so-large- > when-joining-Dataset-with-nested-structure-tp28136.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - >

Why is shuffle write size so large when joining Dataset with nested structure?

2016-11-25 Thread taozhuo

changing the schema? If not, what is the best practice when designing complex schemas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-is-shuffle-write-size-so-large-when-joining-Dataset-with-nested-structure-tp28136.html Sent from the Apache Spark User List