Re: Wide Datasets (v1.6.1)

2016-05-21 Thread Don Drake
I was able to verify the similar exceptions occur in Spark 2.0.0-preview.
I have create this JIRA: https://issues.apache.org/jira/browse/SPARK-15467

You mentioned using beans instead of case classes, do you have an example
(or test case) that I can see?

-Don

On Fri, May 20, 2016 at 3:49 PM, Michael Armbrust 
wrote:

> I can provide an example/open a Jira if there is a chance this will be
>> fixed.
>>
>
> Please do!  Ping me on it.
>
> Michael
>



-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake 
800-733-2143


Re: Wide Datasets (v1.6.1)

2016-05-20 Thread Michael Armbrust
>
> I can provide an example/open a Jira if there is a chance this will be
> fixed.
>

Please do!  Ping me on it.

Michael


Wide Datasets (v1.6.1)

2016-05-20 Thread Don Drake
I have been working to create a Dataframe that contains a nested
structure.  The first attempt is to create an array of structures.   I've
written previously on this list how it doesn't work in Dataframes in 1.6.1,
but it does in 2.0.

I've continued my experimenting and have it working in Datasets in 1.6.1,
using ds.groupBy($"col").groupMaps().  This works great when the number
of columns is less than the maximum for a case class (22 in scala 2.10, 254
in scala 2.11).  However, while using a custom written case class of 200+
fields, I did run into a Catalyst/Janino stack overflow exception (during
runtime, it as attempting to compile my large class) so that doesn't work.
I can provide an example/open a Jira if there is a chance this will be
fixed.

My question is the following: Datasets rely on case classes, if I have a
dataset with more than 254 fields (and I have a lot of them), how am I
supposed to use Datasets with these wide tables?  Am I forced to use
Dataframes?

Thanks.

-Don

-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake 
800-733-2143