Issue with pyspark 1.3.0, sql package and rows

Stefano Parmesan Tue, 07 Apr 2015 04:23:42 -0700

Hi all,

I've already opened a bug on Jira some days ago [1] but I'm starting
thinking this is not the correct way to go since I haven't got any news
about it yet.


Let me try to explain it briefly: with pyspark, trying to cogroup two input
files with different schemas lead (nondeterministically) to some wrong
behaviour: the object coming from the first input will have the fields of
the second one (or vice-versa); the important fact is that the data in the
row is actually correct, what's wrong is the content of the __FIELDS__ on
the rows.

Attached to the issue I posted a small snippet to reproduce the issue
(which is a gist [2]).

Does this happen to others as well? Is it a known issue? Am I doing
anything wrong?

Thank you all,

[1]: https://issues.apache.org/jira/browse/SPARK-6677
[2]: https://gist.github.com/armisael/e08bb4567d0a11efe2db

-- 
Dott. Stefano Parmesan
Backend Web Developer and Data Lover ~ SpazioDati s.r.l.
Via Adriano Olivetti, 13 – 4th floor
"Le Albere" district – 38122 Trento – Italy

Issue with pyspark 1.3.0, sql package and rows

Reply via email to