[ https://issues.apache.org/jira/browse/SPARK-23178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334206#comment-16334206 ]
KIryl Sultanau edited comment on SPARK-23178 at 1/22/18 12:38 PM: ------------------------------------------------------------------ With unsafe switch off this example works fine: {quote}.config("spark.kryo.unsafe", "false") {quote} No null or incorrect IDs in both data sets. was (Author: kirills2006): With unsafe switch off this example works fine: {quote}.config("spark.kryo.unsafe", "false") {quote} > Kryo Unsafe problems with count distinct from cache > --------------------------------------------------- > > Key: SPARK-23178 > URL: https://issues.apache.org/jira/browse/SPARK-23178 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0, 2.2.1 > Reporter: KIryl Sultanau > Priority: Major > Attachments: Unsafe-issue.png > > > val spark = SparkSession > .builder > .appName("unsafe-issue") > .master("local[*]") > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("spark.kryo.unsafe", "true") > .config("spark.kryo.registrationRequired", "false") > .getOrCreate() > val devicesDF = spark.read.format("csv") > .option("header", "true") > .option("delimiter", "\t") > .load("/data/Devices.tsv").cache() > val gatewaysDF = spark.read.format("csv") > .option("header", "true") > .option("delimiter", "\t") > .load("/data/Gateways.tsv").cache() > val devJoinedDF = devicesDF.join(gatewaysDF, Seq("GatewayId"), > "inner").cache() > devJoinedDF.printSchema() > println(devJoinedDF.count()) > println(devJoinedDF.select("DeviceId").distinct().count()) > println(devJoinedDF.groupBy("DeviceId").count().filter("count>1").count()) > println(devJoinedDF.groupBy("DeviceId").count().filter("count=1").count()) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org