[ https://issues.apache.org/jira/browse/SPARK-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609638#comment-16609638 ]
Dongjoon Hyun edited comment on SPARK-12417 at 9/10/18 6:23 PM: ---------------------------------------------------------------- This is fixed since 2.0.0. {code} scala> spark.version res0: String = 2.0.0 scala> Seq((1,2)).toDF("a", "b").write.option("orc.bloom.filter.columns", "*").orc("/tmp/orc200") $ hive --orcfiledump /tmp/orc200/part-r-00007-d36ca145-1e23-4d3a-ba99-09506e4ed8cc.snappy.orc ... Stripes: Stripe: offset: 3 data: 12 rows: 1 tail: 92 index: 1390 Stream: column 0 section ROW_INDEX start: 3 length 11 Stream: column 0 section BLOOM_FILTER start: 14 length 426 Stream: column 1 section ROW_INDEX start: 440 length 24 Stream: column 1 section BLOOM_FILTER start: 464 length 456 Stream: column 2 section ROW_INDEX start: 920 length 24 Stream: column 2 section BLOOM_FILTER start: 944 length 449 Stream: column 1 section DATA start: 1393 length 6 Stream: column 2 section DATA start: 1399 length 6 ... {code} was (Author: dongjoon): This is fixed since 2.0.0. {code} scala> spark.version res0: String = 2.0.0 scala> Seq((1,2)).toDF("a", "b").write.option("orc.bloom.filter.columns", "*").orc("/tmp/orc200") {code} $ hive --orcfiledump /tmp/orc200/part-r-00007-d36ca145-1e23-4d3a-ba99-09506e4ed8cc.snappy.orc ... Stripes: Stripe: offset: 3 data: 12 rows: 1 tail: 92 index: 1390 Stream: column 0 section ROW_INDEX start: 3 length 11 Stream: column 0 section BLOOM_FILTER start: 14 length 426 Stream: column 1 section ROW_INDEX start: 440 length 24 Stream: column 1 section BLOOM_FILTER start: 464 length 456 Stream: column 2 section ROW_INDEX start: 920 length 24 Stream: column 2 section BLOOM_FILTER start: 944 length 449 Stream: column 1 section DATA start: 1393 length 6 Stream: column 2 section DATA start: 1399 length 6 ... {code} > Orc bloom filter options are not propagated during file write in spark > ---------------------------------------------------------------------- > > Key: SPARK-12417 > URL: https://issues.apache.org/jira/browse/SPARK-12417 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Rajesh Balamohan > Assignee: Apache Spark > Priority: Minor > Fix For: 2.0.0 > > Attachments: SPARK-12417.1.patch > > > ORC bloom filter is supported by the version of hive used in Spark 1.5.2. > However, when trying to create orc file with bloom filter option, it does not > make use of it. > E.g, following orc output does not create the bloom filter even though the > options are specified. > {noformat} > Map<String, String> orcOption = new HashMap<String, String>(); > orcOption.put("orc.bloom.filter.columns", "*"); > hiveContext.sql("select * from accounts where > effective_date='2015-12-30'").write(). > format("orc").options(orcOption).save("/tmp/accounts"); > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org