[jira] [Comment Edited] (SPARK-12417) Orc bloom filter options are not propagated during file write in spark

Dongjoon Hyun (JIRA) Mon, 10 Sep 2018 11:24:19 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609638#comment-16609638
 ]


Dongjoon Hyun edited comment on SPARK-12417 at 9/10/18 6:23 PM:
----------------------------------------------------------------

This is fixed since 2.0.0.
{code}
scala> spark.version
res0: String = 2.0.0

scala> Seq((1,2)).toDF("a", "b").write.option("orc.bloom.filter.columns", 
"*").orc("/tmp/orc200")

$ hive --orcfiledump 
/tmp/orc200/part-r-00007-d36ca145-1e23-4d3a-ba99-09506e4ed8cc.snappy.orc
...
Stripes:
  Stripe: offset: 3 data: 12 rows: 1 tail: 92 index: 1390
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 0 section BLOOM_FILTER start: 14 length 426
    Stream: column 1 section ROW_INDEX start: 440 length 24
    Stream: column 1 section BLOOM_FILTER start: 464 length 456
    Stream: column 2 section ROW_INDEX start: 920 length 24
    Stream: column 2 section BLOOM_FILTER start: 944 length 449
    Stream: column 1 section DATA start: 1393 length 6
    Stream: column 2 section DATA start: 1399 length 6
...
{code}


was (Author: dongjoon):
This is fixed since 2.0.0.
{code}
scala> spark.version
res0: String = 2.0.0

scala> Seq((1,2)).toDF("a", "b").write.option("orc.bloom.filter.columns", 
"*").orc("/tmp/orc200")
{code}
$ hive --orcfiledump 
/tmp/orc200/part-r-00007-d36ca145-1e23-4d3a-ba99-09506e4ed8cc.snappy.orc
...
Stripes:
  Stripe: offset: 3 data: 12 rows: 1 tail: 92 index: 1390
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 0 section BLOOM_FILTER start: 14 length 426
    Stream: column 1 section ROW_INDEX start: 440 length 24
    Stream: column 1 section BLOOM_FILTER start: 464 length 456
    Stream: column 2 section ROW_INDEX start: 920 length 24
    Stream: column 2 section BLOOM_FILTER start: 944 length 449
    Stream: column 1 section DATA start: 1393 length 6
    Stream: column 2 section DATA start: 1399 length 6
...
{code}

> Orc bloom filter options are not propagated during file write in spark
> ----------------------------------------------------------------------
>
>                 Key: SPARK-12417
>                 URL: https://issues.apache.org/jira/browse/SPARK-12417
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>            Assignee: Apache Spark
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: SPARK-12417.1.patch
>
>
> ORC bloom filter is supported by the version of hive used in Spark 1.5.2. 
> However, when trying to create orc file with bloom filter option, it does not 
> make use of it.
> E.g, following orc output does not create the bloom filter even though the 
> options are specified.
> {noformat}
>     Map<String, String> orcOption = new HashMap<String, String>();
>     orcOption.put("orc.bloom.filter.columns", "*");
>     hiveContext.sql("select * from accounts where 
> effective_date='2015-12-30'").write().
>         format("orc").options(orcOption).save("/tmp/accounts");
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12417) Orc bloom filter options are not propagated during file write in spark

Reply via email to