[
https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664352#comment-15664352
]
Sean Owen commented on SPARK-17436:
-----------------------------------
See recent JIRAs about JavaAPISuite on Windows. I assume you might be on
Windows? develop on Linux or OS X if you can, since I don't think Windows
entirely works for development of Spark. I am not sure what other problems
you're having because you didn't say, but, again I can build and pass all tests
right now and so can Jenkins (again, modulo a non-trivial amount of test
flakiness)
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
> dataframe.write sometimes does not keep sorting
> -----------------------------------------------
>
> Key: SPARK-17436
> URL: https://issues.apache.org/jira/browse/SPARK-17436
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.6.1, 1.6.2, 2.0.0
> Reporter: Ran Haim
>
> When using partition by, datawriter can sometimes mess up an ordered
> dataframe.
> The problem originates in
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.
> In the writeRows method when too many files are opened (configurable), it
> starts inserting rows to UnsafeKVExternalSorter, then it reads all the rows
> again from the sorter and writes them to the corresponding files.
> The problem is that the sorter actually sorts the rows using the partition
> key, and that can sometimes mess up the original sort (or secondary sort if
> you will).
> I think the best way to fix it is to stop using a sorter, and just put the
> rows in a map using key as partition key and value as an arraylist, and then
> just walk through all the keys and write it in the original order - this will
> probably be faster as there no need for ordering.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]