[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079010#comment-16079010 ]
Kazuaki Ishizaki commented on SPARK-21344: ------------------------------------------ I will work for this if anyone has finished a PR. > BinaryType comparison does signed byte array comparison > ------------------------------------------------------- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.1.1 > Reporter: Shubham Chopra > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org