Michael Armbrust created SPARK-1678:
---------------------------------------
Summary: Compression loses repeated values.
Key: SPARK-1678
URL: https://issues.apache.org/jira/browse/SPARK-1678
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Michael Armbrust
Assignee: Cheng Lian
Priority: Blocker
Fix For: 1.0.0
Here's a test case:
{code}
test("all the same strings") {
sparkContext.parallelize(1 to 1000).map(_ =>
StringData("test")).registerAsTable("test1000")
assert(sql("SELECT * FROM test1000").count() === 1000)
cacheTable("test1000")
assert(sql("SELECT * FROM test1000").count() === 1000)
}
{code}
First assert passes, second one fails.
--
This message was sent by Atlassian JIRA
(v6.2#6252)