Replied in the ticket. On Tue, Nov 8, 2016 at 11:36 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote:
> SPARK-18367 <https://issues.apache.org/jira/browse/SPARK-18367>: limit() > makes the lame walk again > > On Tue, Nov 8, 2016 at 5:00 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Hmm, it doesn’t seem like I can access the output of >> df._jdf.queryExecution().hiveResultString() from Python, and until I can >> boil the issue down a bit, I’m stuck with using Python. >> >> I’ll have a go at using regexes to strip some stuff from the printed >> plans. The one that’s working for me to strip the IDs is #\d+L?. >> >> Nick >> >> >> On Tue, Nov 8, 2016 at 4:47 PM Reynold Xin <r...@databricks.com> wrote: >> >> If you want to peek into the internals and do crazy things, it is much >> easier to do it in Scala with df.queryExecution. >> >> For explain string output, you can work around the comparison simply by >> doing replaceAll("#\\d+", "#x") >> >> similar to the patch here: https://github.com/apache/spark/commit/ >> fd90541c35af2bccf0155467bec8cea7c8865046#diff- >> 432455394ca50800d5de508861984ca5R217 >> >> >> >> On Tue, Nov 8, 2016 at 1:42 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >> I’m trying to understand what I think is an optimizer bug. To do that, >> I’d like to compare the execution plans for a certain query with and >> without a certain change, to understand how that change is impacting the >> plan. >> >> How would I do that in PySpark? I’m working with 2.0.1, but I can use >> master if it helps. >> >> explain() >> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.explain> >> is helpful but is limited in two important ways: >> >> 1. It prints to screen and doesn’t offer another way to access the >> plan or capture it. >> 2. >> >> The printed plan includes auto-generated IDs that make diffing >> impossible. e.g. >> >> == Physical Plan == >> *Project [struct(primary_key#722, person#550, dataset_name#671) >> >> >> Any suggestions on what to do? Any relevant JIRAs I should follow? >> >> Nick >> >> >> >>