Hi Takeshi,
Thank you for your comment. I changed it to RDD and it's a lot better.
Zhuo
On Fri, Nov 25, 2016 at 7:04 PM, Takeshi Yamamuro
wrote:
> Hi,
>
> I think this is just the overhead to represent nested elements as internal
> rows on-runtime
> (e.g., it consumes
-user-list.
> 1001560.n3.nabble.com/Why-is-shuffle-write-size-so-large-
> when-joining-Dataset-with-nested-structure-tp28136.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
>
changing the schema? If not, what is
the best practice when designing complex schemas?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-is-shuffle-write-size-so-large-when-joining-Dataset-with-nested-structure-tp28136.html
Sent from the Apache Spark User List