[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439231#comment-16439231 ]
liuxian commented on SPARK-23989: --------------------------------- For {color:#333333}`SortShuffleWriter`{color}, `records: {color:#4e807d}Iterator{color}[Product2[{color:#4e807d}K{color}{color:#cc7832}, {color}{color:#4e807d}V{color}]]` is key-value pair, but the value is 'UnsafeRow' type. For example ,we insert the first record {color:#333333}into `PartitionedPairBuffer`, we only save the '{color:#cc7832}AnyRef{color}', but the {color:#333333} '{color:#cc7832}AnyRef{color}'{color} of next {color}record(only value, not key) is same as the first record , so the first record is overwritten. h1. overwritten > When using `SortShuffleWriter`, the data will be overwritten > ------------------------------------------------------------ > > Key: SPARK-23989 > URL: https://issues.apache.org/jira/browse/SPARK-23989 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: liuxian > Priority: Critical > > {color:#333333}When using `SortShuffleWriter`, we only insert > '{color}{color:#cc7832}AnyRef{color}{color:#333333}' into > '{color}PartitionedAppendOnlyMap{color:#333333}' or > '{color}PartitionedPairBuffer{color:#333333}'.{color} > {color:#333333}For this function:{color} > {color:#cc7832}override def {color}{color:#ffc66d}write{color}(records: > {color:#4e807d}Iterator{color}[Product2[{color:#4e807d}K{color}{color:#cc7832}, > {color}{color:#4e807d}V{color}]]) > the value of 'records' is `UnsafeRow`, so the value will be overwritten > {color:#333333} {color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org