Re: Creating a DataFrame from scratch

Jean Georges Perrin Fri, 22 Jul 2016 11:18:07 -0700

You're right, it's the save behavior... Oh well... I wanted something easy :(


> On Jul 22, 2016, at 12:41 PM, Everett Anderson <ever...@nuna.com.invalid 
> <mailto:ever...@nuna.com.invalid>> wrote:
> 
> Actually, sorry, my mistake, you're calling
> 
>               DataFrame df = sqlContext.createDataFrame(data, 
> org.apache.spark.sql.types.NumericType.class);
> 
> and giving it a list of objects which aren't NumericTypes, but the wildcards 
> in the signature let it happen.
> 
> I'm curious what'd happen if you gave it Integer.class, but I suspect it 
> still won't work because Integer may not have the bean-style getters.
> 
> 
> On Fri, Jul 22, 2016 at 9:37 AM, Everett Anderson <ever...@nuna.com 
> <mailto:ever...@nuna.com>> wrote:
> Hey,
> 
> I think what's happening is that you're calling this createDataFrame method 
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(java.util.List,%20java.lang.Class)>:
> 
> createDataFrame(java.util.List<?> data, java.lang.Class<?> beanClass) 
> 
> which expects a JavaBean-style class with get and set methods for the 
> members, but Integer doesn't have such a getter. 
> 
> I bet there's an easier way if you just want a single-column DataFrame of a 
> primitive type, but one way that would work is to manually construct the Rows 
> using RowFactory.create() 
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/RowFactory.html#create(java.lang.Object...)>
>  and assemble the DataFrame from that like
> 
> List<Row> rows = convert your List<Integer> to this in a loop with 
> RowFactory.create()
> 
> StructType schema = DataTypes.createStructType(Collections.singletonList(
>      DataTypes.createStructField("int_field", DataTypes.IntegerType, true)));
> 
> DataFrame intDataFrame = sqlContext.createDataFrame(rows, schema);
> 
> 
> 
> On Fri, Jul 22, 2016 at 7:53 AM, Jean Georges Perrin <j...@jgp.net 
> <mailto:j...@jgp.net>> wrote:
> 
> 
> I am trying to build a DataFrame from a list, here is the code:
> 
>       private void start() {
>               SparkConf conf = new SparkConf().setAppName("Data Set from 
> Array").setMaster("local");
>               SparkContext sc = new SparkContext(conf);
>               SQLContext sqlContext = new SQLContext(sc);
> 
>               Integer[] l = new Integer[] { 1, 2, 3, 4, 5, 6, 7 };
>               List<Integer> data = Arrays.asList(l);
> 
>               System.out.println(data);
>               
>               DataFrame df = sqlContext.createDataFrame(data, 
> org.apache.spark.sql.types.NumericType.class);
>               df.show();
>       }
> 
> My result is (unpleasantly):
> 
> [1, 2, 3, 4, 5, 6, 7]
> ++
> ||
> ++
> ||
> ||
> ||
> ||
> ||
> ||
> ||
> ++
> 
> I also tried with:
> org.apache.spark.sql.types.NumericType.class
> org.apache.spark.sql.types.IntegerType.class
> org.apache.spark.sql.types.ArrayType.class
> 
> I am probably missing something super obvious :(
> 
> Thanks!
> 
> jg
> 
> 
> 
>

Re: Creating a DataFrame from scratch

Reply via email to