Hi Davies,

Thanks for the quick fix. I'm sorry to send out a bug report on release day
- 1.1.0 really is a great release.  I've been running the 1.1 branch for a
while and there's definitely lots of good stuff.

For the workaround, I think you may have meant:

srdd2 = SchemaRDD(srdd._jschema_rdd.coalesce(N, False, None), sqlCtx)

Note:
"_schema_rdd" -> "_jschema_rdd"
"false" -> "False"

That workaround seems to work fine (in that I've observed the correct
number of partitions in the web-ui, although haven't tested it any beyond
that).

Thanks!
-Brad

On Thu, Sep 11, 2014 at 11:30 PM, Davies Liu <dav...@databricks.com> wrote:

> This is a bug, I had create an issue to track this:
> https://issues.apache.org/jira/browse/SPARK-3500
>
> Also, there is PR to fix this: https://github.com/apache/spark/pull/2369
>
> Before next bugfix release, you can workaround this by:
>
> srdd = sqlCtx.jsonRDD(rdd)
> srdd2 = SchemaRDD(srdd._schema_rdd.coalesce(N, false, None), sqlCtx)
>
>
> On Thu, Sep 11, 2014 at 6:12 PM, Brad Miller <bmill...@eecs.berkeley.edu>
> wrote:
> > Hi All,
> >
> > I'm having some trouble with the coalesce and repartition functions for
> > SchemaRDD objects in pyspark.  When I run:
> >
> > sqlCtx.jsonRDD(sc.parallelize(['{"foo":"bar"}',
> > '{"foo":"baz"}'])).coalesce(1)
> >
> > I get this error:
> >
> > Py4JError: An error occurred while calling o94.coalesce. Trace:
> > py4j.Py4JException: Method coalesce([class java.lang.Integer, class
> > java.lang.Boolean]) does not exist
> >
> > For context, I have a dataset stored in a parquet file, and I'm using
> > SQLContext to make several queries against the data.  I then register the
> > results of these as queries new tables in the SQLContext.  Unfortunately
> > each new table has the same number of partitions as the original (despite
> > being much smaller).  Hence my interest in coalesce and repartition.
> >
> > Has anybody else encountered this bug?  Is there an alternate workflow I
> > should consider?
> >
> > I am running the 1.1.0 binaries released today.
> >
> > best,
> > -Brad
>

Reply via email to