[jira] [Commented] (FLINK-9407) Support orc rolling sink writer

ASF GitHub Bot (JIRA) Sun, 08 Jul 2018 23:15:00 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536560#comment-16536560
 ]


ASF GitHub Bot commented on FLINK-9407:
---------------------------------------

Github user zhangminglei commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6075#discussion_r200888987
  
    --- Diff: flink-connectors/flink-orc/pom.xml ---
    @@ -54,6 +54,14 @@ under the License.
                        <optional>true</optional>
                </dependency>
     
    +           <dependency>
    +                   <groupId>org.apache.flink</groupId>
    +                   
<artifactId>flink-connector-filesystem_${scala.binary.version}</artifactId>
    +                   <version>${project.version}</version>
    +                   <!-- Projects depending on this project, won't depend 
on flink-filesystem. -->
    +                   <optional>true</optional>
    +           </dependency>
    +
                <dependency>
                        <groupId>org.apache.orc</groupId>
                        <artifactId>orc-core</artifactId>
    --- End diff --
    
    Yes. We can upgrade it. Will update.


> Support orc rolling sink writer
> -------------------------------
>
>                 Key: FLINK-9407
>                 URL: https://issues.apache.org/jira/browse/FLINK-9407
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: zhangminglei
>            Assignee: zhangminglei
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and 
> {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling 
> sink.
> Below, FYI.
> I tested the PR and verify the results with spark sql. Obviously, we can get 
> the results of what we had written down before. But I will give more tests in 
> the next couple of days. Including the performance under compression with 
> short checkpoint intervals. And more UTs.
> {code:java}
> scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
> res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more 
> field]
> scala>
> scala> res1.registerTempTable("tablerice")
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> scala> spark.sql("select * from tablerice")
> res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more 
> field]
> scala> res3.show(3)
> +-----+---+-------+
> | name|age|married|
> +-----+---+-------+
> |Sagar| 26|  false|
> |Sagar| 30|  false|
> |Sagar| 34|  false|
> +-----+---+-------+
> only showing top 3 rows
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9407) Support orc rolling sink writer

Reply via email to